1,564 research outputs found

    Evaluation and Acceleration of High-Throughput Fixed-Point Object Detection on FPGAs

    Full text link

    Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions

    Get PDF
    In the past decade, Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance in various Artificial Intelligence tasks. To accelerate the experimentation and development of CNNs, several software frameworks have been released, primarily targeting power-hungry CPUs and GPUs. In this context, reconfigurable hardware in the form of FPGAs constitutes a potential alternative platform that can be integrated in the existing deep learning ecosystem to provide a tunable balance between performance, power consumption and programmability. In this paper, a survey of the existing CNN-to-FPGA toolflows is presented, comprising a comparative study of their key characteristics which include the supported applications, architectural choices, design space exploration methods and achieved performance. Moreover, major challenges and objectives introduced by the latest trends in CNN algorithmic research are identified and presented. Finally, a uniform evaluation methodology is proposed, aiming at the comprehensive, complete and in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal, 201

    FPGA ACCELERATION OF A CORTICAL AND A MATCHED FILTER-BASED ALGORITHM

    Get PDF
    Digital image processing is a widely used and diverse field. It is used in a broad array of areas such as tracking and detection, object avoidance, computer vision, and numerous other applications. For many image processing tasks, the computations can become time consuming. Therefore, a means for accelerating the computations would be beneficial. Using that as motivation, this thesis examines the acceleration of two distinctly different image processing applications. The first image processing application examined is a recent neocortex inspired cognitive model geared towards pattern recognition as seen in the visual cortex. For this model, both software and reconfigurable logic based FPGA implementations of the model are examined on a Cray XD1. Results indicate that hardware-acceleration can provide average throughput gains of 75 times over software-only implementations of the networks examined when utilizing the full resources of the Cray XD1. The second image processing application examined is matched filter-based position detection. This approach is at the heart of the automatic alignment algorithm currently being tested in the National Ignition Faculty presently under construction at the Lawrence Livermore National Laboratory. To reduce the processing time of the match filtering, a reconfigurable logic architecture was developed. Results show that the reconfigurable logic architecture provides a speedup of approximately 253 times over an optimized software implementation

    A general framework for efficient FPGA implementation of matrix product

    Get PDF
    Original article can be found at: http://www.medjcn.com/ Copyright Softmotor LimitedHigh performance systems are required by the developers for fast processing of computationally intensive applications. Reconfigurable hardware devices in the form of Filed-Programmable Gate Arrays (FPGAs) have been proposed as viable system building blocks in the construction of high performance systems at an economical price. Given the importance and the use of matrix algorithms in scientific computing applications, they seem ideal candidates to harness and exploit the advantages offered by FPGAs. In this paper, a system for matrix algorithm cores generation is described. The system provides a catalog of efficient user-customizable cores, designed for FPGA implementation, ranging in three different matrix algorithm categories: (i) matrix operations, (ii) matrix transforms and (iii) matrix decomposition. The generated core can be either a general purpose or a specific application core. The methodology used in the design and implementation of two specific image processing application cores is presented. The first core is a fully pipelined matrix multiplier for colour space conversion based on distributed arithmetic principles while the second one is a parallel floating-point matrix multiplier designed for 3D affine transformations.Peer reviewe
    • …
    corecore