1,318 research outputs found

    Empowering parallel computing with field programmable gate arrays

    Get PDF
    After more than 30 years, reconfigurable computing has grown from a concept to a mature field of science and technology. The cornerstone of this evolution is the field programmable gate array, a building block enabling the configuration of a custom hardware architecture. The departure from static von Neumannlike architectures opens the way to eliminate the instruction overhead and to optimize the execution speed and power consumption. FPGAs now live in a growing ecosystem of development tools, enabling software programmers to map algorithms directly onto hardware. Applications abound in many directions, including data centers, IoT, AI, image processing and space exploration. The increasing success of FPGAs is largely due to an improved toolchain with solid high-level synthesis support as well as a better integration with processor and memory systems. On the other hand, long compile times and complex design exploration remain areas for improvement. In this paper we address the evolution of FPGAs towards advanced multi-functional accelerators, discuss different programming models and their HLS language implementations, as well as high-performance tuning of FPGAs integrated into a heterogeneous platform. We pinpoint fallacies and pitfalls, and identify opportunities for language enhancements and architectural refinements

    Autotuning the Intel HLS Compiler using the Opentuner Framework

    Get PDF
    High level synthesis (HLS) tools can be used to improve design flow and decrease verification times for field programmable gate array (FPGA) and application specific integrated circuit (ASIC) design. The Intel HLS Compiler is a high level synthesis tool that takes in untimed C/C++ as input and generates production-quality register transfer level (RTL) code that is optimized for Intel FPGAs. The translation does, however, require multiple iterations and manual optimizations to get comparable synthesized results to that of a solution written in a hardware descriptive language. The synthesis results can vary greatly based upon coding style and optimization techniques, and typically require an in-depth knowledge of FPGAs to fully optimize the translation which limits the audience of the tool. The extra abstraction that the C/C++ source code presents can also make it difficult to meet more specific design requirements; this includes designs to meet specific resource usage or performance based metrics. To improve the quality of results generated by the Intel HLS Compiler without a manual iterative process that requires an in-depth knowledge of FPGAs, this research proposes a method of automating some of the optimization techniques that improve the synthesized design through an autotuning process. The proposed approach utilizes the PyCParser library to parse C source files and the OpenTuner Framework to autotune the synthesis to provide a method that generates results that better meet the needs of the designer's requirements through lower FPGA resource usage or increased design performance. Such functionality is not currently available in Intel's commercial tools. The proposed approach was tested with the CHStone Benchmarking Suite of C programs as well as a standard digital signal processing finite impulse response filter. The results show that the commercial HLS tool can be automatically autotuned through placeholder injection using a source parsing tool for C code and using the OpenTuner Framework to autotune the results. For designs that are small in nature and include conducive structures to be autotuned, the results indicate resource usage reductions and/or performance increases of up to 40% as compared to the default Intel HLS Compiler results. The method developed in this research also allows additional design targets to be specified through the autotuner for consideration in the synthesized design which can yield results that are better matched to a design's requirements

    AutoAccel: Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture

    Full text link
    CPU-FPGA heterogeneous architectures are attracting ever-increasing attention in an attempt to advance computational capabilities and energy efficiency in today's datacenters. These architectures provide programmers with the ability to reprogram the FPGAs for flexible acceleration of many workloads. Nonetheless, this advantage is often overshadowed by the poor programmability of FPGAs whose programming is conventionally a RTL design practice. Although recent advances in high-level synthesis (HLS) significantly improve the FPGA programmability, it still leaves programmers facing the challenge of identifying the optimal design configuration in a tremendous design space. This paper aims to address this challenge and pave the path from software programs towards high-quality FPGA accelerators. Specifically, we first propose the composable, parallel and pipeline (CPP) microarchitecture as a template of accelerator designs. Such a well-defined template is able to support efficient accelerator designs for a broad class of computation kernels, and more importantly, drastically reduce the design space. Also, we introduce an analytical model to capture the performance and resource trade-offs among different design configurations of the CPP microarchitecture, which lays the foundation for fast design space exploration. On top of the CPP microarchitecture and its analytical model, we develop the AutoAccel framework to make the entire accelerator generation automated. AutoAccel accepts a software program as an input and performs a series of code transformations based on the result of the analytical-model-based design space exploration to construct the desired CPP microarchitecture. Our experiments show that the AutoAccel-generated accelerators outperform their corresponding software implementations by an average of 72x for a broad class of computation kernels
    corecore