344 research outputs found

    An area-efficient 2-D convolution implementation on FPGA for space applications

    Get PDF
    The 2-D Convolution is an algorithm widely used in image and video processing. Although its computation is simple, its implementation requires a high computational power and an intensive use of memory. Field Programmable Gate Arrays (FPGA) architectures were proposed to accelerate calculations of 2-D Convolution and the use of buffers implemented on FPGAs are used to avoid direct memory access. In this paper we present an implementation of the 2-D Convolution algorithm on a FPGA architecture designed to support this operation in space applications. This proposed solution dramatically decreases the area needed keeping good performance, making it appropriate for embedded systems in critical space application

    Empowering parallel computing with field programmable gate arrays

    Get PDF
    After more than 30 years, reconļ¬gurable computing has grown from a concept to a mature ļ¬eld of science and technology. The cornerstone of this evolution is the ļ¬eld programmable gate array, a building block enabling the conļ¬guration of a custom hardware architecture. The departure from static von Neumannlike architectures opens the way to eliminate the instruction overhead and to optimize the execution speed and power consumption. FPGAs now live in a growing ecosystem of development tools, enabling software programmers to map algorithms directly onto hardware. Applications abound in many directions, including data centers, IoT, AI, image processing and space exploration. The increasing success of FPGAs is largely due to an improved toolchain with solid high-level synthesis support as well as a better integration with processor and memory systems. On the other hand, long compile times and complex design exploration remain areas for improvement. In this paper we address the evolution of FPGAs towards advanced multi-functional accelerators, discuss different programming models and their HLS language implementations, as well as high-performance tuning of FPGAs integrated into a heterogeneous platform. We pinpoint fallacies and pitfalls, and identify opportunities for language enhancements and architectural reļ¬nements

    Dynamically variable step search motion estimation algorithm and a dynamically reconfigurable hardware for its implementation

    Get PDF
    Motion Estimation (ME) is the most computationally intensive part of video compression and video enhancement systems. For the recently available High Definition (HD) video formats, the computational complexity of De full search (FS) ME algorithm is prohibitively high, whereas the PSNR obtained by fast search ME algorithms is low. Therefore, ill this paper, we present Dynamically Variable Step Search (DVSS) ME algorithm for Processing high definition video formats and a dynamically reconfigurable hardware efficiently implementing DVSS algorithm. The architecture for efficiently implementing DVSS algorithm. The simulation results showed that DVSS algorithm performs very close to FS algorithm by searching much fewer search locations than FS algorithm and it outperforms successful past search ME algorithms by searching more search locations than these algorithms. The proposed hardware is implemented in VHDL and is capable, of processing high definition video formats in real time. Therefore, it can be used in consumer electronics products for video compression, frame rate up-conversion and de-interlacing(1)

    Fast and compact evolvable systolic arrays on dynamically reconfigurable FPGAs

    Get PDF
    Evolvable hardware may be considered as the result of a design methodology that employs an evolutionary algorithm to find an optimal solution to a given problem in the form of a digital circuit. Evolutionary algorithms typically require testing thousands of candidate solutions, taking long time to complete. It would be desirable to reduce this time to a few seconds for applications that require a fast adaptation to a problem. Also, it is important to consider architectures that may operate at high clock speeds in order to reach very speed-demanding situations. This paper presents an implementation on an FPGA of an evolvable hardware image filter based on a systolic array architecture that uses dynamic partial reconfiguration in order to change between different candidate solutions. The neighbor to neighbor connections of the array offer improved performance versus other approaches, like Cartesian Genetic Programming derived circuits. Time savings due to faster evaluation compensate the slower reconfiguration time compared with virtual reconfiguration approaches, but, at any rate, reconfiguration time has been improved also by reducing the elements to reconfigure to just the LUT contents of the configurable blocks. The techniques presented in this paper lead to circuits that may operate at up to 500 MHz (in a Virtex-5), filtering 500 megapixels per second, the processing element size of the array is reduced to 2 CLBs, and over 80000 evaluations per second in a multiplearray structure in an FPGA permit to obtain good quality filters in around 3 seconds of evolution time

    Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions

    Get PDF
    In the past decade, Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance in various Artificial Intelligence tasks. To accelerate the experimentation and development of CNNs, several software frameworks have been released, primarily targeting power-hungry CPUs and GPUs. In this context, reconfigurable hardware in the form of FPGAs constitutes a potential alternative platform that can be integrated in the existing deep learning ecosystem to provide a tunable balance between performance, power consumption and programmability. In this paper, a survey of the existing CNN-to-FPGA toolflows is presented, comprising a comparative study of their key characteristics which include the supported applications, architectural choices, design space exploration methods and achieved performance. Moreover, major challenges and objectives introduced by the latest trends in CNN algorithmic research are identified and presented. Finally, a uniform evaluation methodology is proposed, aiming at the comprehensive, complete and in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal, 201

    Adaptive Prognostic Malfunction Based Processor for Autonomous Landing Guidance Assistance System Using FPGA

    Full text link
    The demand for more developed and agile urban taxi drones is increasing rapidly nowadays to sustain crowded cities and their traffic issues. The critical factor for spreading such technology could be related to the safety criteria that must be considered. One of the most critical safety aspects for such vertical and/or Short Take-Off and Landing (V/STOL) drones is related to safety during the landing stage, in which most of the recent flight accidents have occurred. This paper focused on solving this issue by proposing decentralized processing cores that could improve the landing failure rate by depending on a Fuzzy Logic System (FLS) and additional Digital Signal Processing (DSP) elements. Also, the proposed system will enhance the safety factor during the landing stages by adding a self-awareness feature in case a certain sensor malfunction occurs using the proposed Adaptive Prognostic Malfunction Unit (APMU). This proposed coarse-grained Autonomous Landing Guidance Assistance System (ALGAS4) processing architecture has been optimized using different optimization techniques. The ALGAS4 architecture has been designed completely using VHDL, and the targeted FPGA was the INTEL Cyclone V 5CGXFC9D6F27C7 chip. According to the synthesis findings of the INTEL Quartus Prime software, the maximum working frequency of the ALGAS4 system is 278.24 MHz. In addition, the proposed ALGAS4 system could maintain a maximum computing performance of approximately 74.85 GOPS while using just 166.56 mW for dynamic and I/O power dissipation.Comment: Published in: IEEE Access ( Volume: 12) - Page(s): 2113 - 212

    A low complexity hardware architecture for motion estimation

    Get PDF
    This paper tackles the problem of accelerating motion estimation for video processing. A novel architecture using binary data is proposed, which attempts to reduce power consumption. The solution exploits redundant operations in the sum of absolute differences (SAD) calculation, by a mechanism known as early termination. Further data redundancies are exploited by using a run length coding addressing scheme, where access to pixels which do not contribute to the final SAD value is minimised. By using these two techniques operations and memory accesses are reduced by 93.29% and 69.17% respectively relative to a systolic array implementation
    • ā€¦
    corecore