516 research outputs found

    Optimizing the FPGA memory design for a Sobel edge detector

    Get PDF
    This paper explores different memory systems by investigating the trade-offs involved with choosing one memory system over another on an FPGA. As an example, we use a Sobel edge detector to look at the trade-offs for different memory components. We demonstrate how each type of memory affects I/O performance and area. By exploiting these trade-offs in performance and area a designer should be able to find an optimum on-chip memory system for a given application

    Optimizing the FPGA memory design for a Sobel edge detector

    Get PDF
    This research explored different memory systems on FPGA chips in order to show the various trade-offs involved with choosing one memory system over another. We explored the different memory components that are found on FPGA chips using the example of a Sobel edge detector. We demonstrated how the different FPGA chip’s memories affected I/O performance and area. By exploiting the trade-offs between these a designer should be able to find an optimal on-chip memory system for a given application. Given further study, we believe we can develop application-specific memory templates that can be used with a hardware compiler to generate optimal on-chip memory system

    Deriving stencil hardware accelerators from a single higher-order function

    Get PDF
    Stencil computations are array based algorithms that apply a computation to all array elements in a fixed regular pattern and can be found in many scientific and engineering applications. Parallelization of these applications becomes more and more important in order to keep up with the demand for computing power. FPGAs offer a lot of computing power but are considered hard to program. In this paper, a design methodology based on transformations of higher-order functions is introduced to facilitate this parallelization process. Using this methodology, efficient FPGA hardware is derived achieving good performance. Two architectures for heat flow computations are synthesized for an FPGA and evaluated. To show the general applicability of the design methodology, several applications have been implemented

    Transformations of High-Level Synthesis Codes for High-Performance Computing

    Full text link
    Specialized hardware architectures promise a major step in performance and energy efficiency over the traditional load/store devices currently employed in large scale computing systems. The adoption of high-level synthesis (HLS) from languages such as C/C++ and OpenCL has greatly increased programmer productivity when designing for such platforms. While this has enabled a wider audience to target specialized hardware, the optimization principles known from traditional software design are no longer sufficient to implement high-performance codes. Fast and efficient codes for reconfigurable platforms are thus still challenging to design. To alleviate this, we present a set of optimizing transformations for HLS, targeting scalable and efficient architectures for high-performance computing (HPC) applications. Our work provides a toolbox for developers, where we systematically identify classes of transformations, the characteristics of their effect on the HLS code and the resulting hardware (e.g., increases data reuse or resource consumption), and the objectives that each transformation can target (e.g., resolve interface contention, or increase parallelism). We show how these can be used to efficiently exploit pipelining, on-chip distributed fast memory, and on-chip streaming dataflow, allowing for massively parallel architectures. To quantify the effect of our transformations, we use them to optimize a set of throughput-oriented FPGA kernels, demonstrating that our enhancements are sufficient to scale up parallelism within the hardware constraints. With the transformations covered, we hope to establish a common framework for performance engineers, compiler developers, and hardware developers, to tap into the performance potential offered by specialized hardware architectures using HLS

    Empowering parallel computing with field programmable gate arrays

    Get PDF
    After more than 30 years, reconfigurable computing has grown from a concept to a mature field of science and technology. The cornerstone of this evolution is the field programmable gate array, a building block enabling the configuration of a custom hardware architecture. The departure from static von Neumannlike architectures opens the way to eliminate the instruction overhead and to optimize the execution speed and power consumption. FPGAs now live in a growing ecosystem of development tools, enabling software programmers to map algorithms directly onto hardware. Applications abound in many directions, including data centers, IoT, AI, image processing and space exploration. The increasing success of FPGAs is largely due to an improved toolchain with solid high-level synthesis support as well as a better integration with processor and memory systems. On the other hand, long compile times and complex design exploration remain areas for improvement. In this paper we address the evolution of FPGAs towards advanced multi-functional accelerators, discuss different programming models and their HLS language implementations, as well as high-performance tuning of FPGAs integrated into a heterogeneous platform. We pinpoint fallacies and pitfalls, and identify opportunities for language enhancements and architectural refinements

    Performance and resource modeling for FPGAs using high-level synthesis tools

    Get PDF
    High-performance computing with FPGAs is gaining momentum with the advent of sophisticated High-Level Synthesis (HLS) tools. The performance of a design is impacted by the input-output bandwidth, the code optimizations and the resource consumption, making the performance estimation a challenge. This paper proposes a performance model which extends the roofline model to take into account the resource consumption and the parameters used in the HLS tools. A strategy is developed which maximizes the performance and the resource utilization within the area of the FPGA. The model is used to optimize the design exploration of a class of window-based image processing application

    High performance computing with FPGAs

    Get PDF
    Field-programmable gate arrays represent an army of logical units which can be organized in a highly parallel or pipelined fashion to implement an algorithm in hardware. The flexibility of this new medium creates new challenges to find the right processing paradigm which takes into account of the natural constraints of FPGAs: clock frequency, memory footprint and communication bandwidth. In this paper first use of FPGAs as a multiprocessor on a chip or its use as a highly functional coprocessor are compared, and the programming tools for hardware/software codesign are discussed. Next a number of techniques are presented to maximize the parallelism and optimize the data locality in nested loops. This includes unimodular transformations, data locality improving loop transformations and use of smart buffers. Finally, the use of these techniques on a number of examples is demonstrated. The results in the paper and in the literature show that, with the proper programming tool set, FPGAs can speedup computation kernels significantly with respect to traditional processors
    corecore