648 research outputs found

    Towards a Scalable Hardware/Software Co-Design Platform for Real-time Pedestrian Tracking Based on a ZYNQ-7000 Device

    Get PDF
    Currently, most designers face a daunting task to research different design flows and learn the intricacies of specific software from various manufacturers in hardware/software co-design. An urgent need of creating a scalable hardware/software co-design platform has become a key strategic element for developing hardware/software integrated systems. In this paper, we propose a new design flow for building a scalable co-design platform on FPGA-based system-on-chip. We employ an integrated approach to implement a histogram oriented gradients (HOG) and a support vector machine (SVM) classification on a programmable device for pedestrian tracking. Not only was hardware resource analysis reported, but the precision and success rates of pedestrian tracking on nine open access image data sets are also analysed. Finally, our proposed design flow can be used for any real-time image processingrelated products on programmable ZYNQ-based embedded systems, which benefits from a reduced design time and provide a scalable solution for embedded image processing products

    Achieving High Speed CFD simulations: Optimization, Parallelization, and FPGA Acceleration for the unstructured DLR TAU Code

    Get PDF
    Today, large scale parallel simulations are fundamental tools to handle complex problems. The number of processors in current computation platforms has been recently increased and therefore it is necessary to optimize the application performance and to enhance the scalability of massively-parallel systems. In addition, new heterogeneous architectures, combining conventional processors with specific hardware, like FPGAs, to accelerate the most time consuming functions are considered as a strong alternative to boost the performance. In this paper, the performance of the DLR TAU code is analyzed and optimized. The improvement of the code efficiency is addressed through three key activities: Optimization, parallelization and hardware acceleration. At first, a profiling analysis of the most time-consuming processes of the Reynolds Averaged Navier Stokes flow solver on a three-dimensional unstructured mesh is performed. Then, a study of the code scalability with new partitioning algorithms are tested to show the most suitable partitioning algorithms for the selected applications. Finally, a feasibility study on the application of FPGAs and GPUs for the hardware acceleration of CFD simulations is presented

    An FPGA-Based Hardware Accelerator For The Digital Image Correlation Engine

    Get PDF
    The work presented in this thesis was aimed at the development of a hardware accelerator for the Digital Image Correlation engine (DICe) and compare two methods of data access, USB and Ethernet. The original DICe software package was created by Sandia National Laboratories and is written in C++. The software runs on any typical workstation PC and performs image correlation on available frame data produced by a camera. When DICe is introduced to a high volume of frames, the correlation time is on the order of days. The time to process and analyze data with DICe becomes a concern when a high-speed camera, like the Phantom VEO 1310, is used which is capable of recording up to 10,000 Frames Per Second (FPS) [1]. To reduce this correlation time the DICe software package was ported over to Verilog, and a Xilinx UltraScale+ MPSoC ZCU104 FPGA was targeted for the design. FPGAs are used to implement the hardware accelerator due to their hardware-level speeds and reprogrammability. The ZCU104 board contains FPGA fabric on the Programmable Logic (PL) side that is used for the implementation of the ported DICe hardware design. On the Processing System (PS) side of the ZCU104, a quad-core ARM Cortex-A53 processor is available that runs the Ubuntu 18.04 LTS Linux-based kernel to provide the drivers for USB and Ethernet I/O, a standard file system that is accessed through a Command-Line Interface (CLI), and to run the program\u27s control scripts that are written in C. This work compares the processing time of the DICe hardware accelerator when frame data is accessed via Ethernet-stream or local USB to showcase the fastest option when using DICe. Both methods of accessing frame data are necessary because data may be offloaded from the camera over Ethernet while it is still recording, or the frame data may be readily available in memory. By providing both a method to access frame data via USB and Ethernet, users have more flexibility when using the DICe hardware accelerator. The work presented in this thesis is significant because it is the first known hardware accelerator for the DICe software

    Energy efficient FPGA based hardware accelerators for financial applications

    Get PDF

    Accelerated Financial Applications through Specialized Hardware, FPGA

    Get PDF
    This project will investigate Field Programmable Gate Array (FPGA) technology in financial applications. FPGA implementation in high performance computing is still in its infancy. Certain companies like XtremeData inc. advertized speed improvements of 50 to 1000 times for DNA sequencing using FPGAs, while using an FPGA as a coprocessor to handle specific tasks provides two to three times more processing power. FPGA technology increases performance by parallelizing calculations. This project will specifically address speed and accuracy improvements of both fundamental and transcendental functions when implemented using FPGA technology. The results of this project will lead to a series of recommendations for effective utilization of FPGA technology in financial applications
    corecore