356 research outputs found

    Compiling dataflow graphs into hardware

    Get PDF
    Department Head: L. Darrell Whitley.2005 Fall.Includes bibliographical references (pages 121-126).Conventional computers are programmed by supplying a sequence of instructions that perform the desired task. A reconfigurable processor is "programmed" by specifying the interconnections between hardware components, thereby creating a "hardwired" system to do the particular task. For some applications such as image processing, reconfigurable processors can produce dramatic execution speedups. However, programming a reconfigurable processor is essentially a hardware design discipline, making programming difficult for application programmers who are only familiar with software design techniques. To bridge this gap, a programming language, called SA-C (Single Assignment C, pronounced "sassy"), has been designed for programming reconfigurable processors. The process involves two main steps - first, the SA-C compiler analyzes the input source code and produces a hardware-independent intermediate representation of the program, called a dataflow graph (DFG). Secondly, this DFG is combined with hardware-specific information to create the final configuration. This dissertation describes the design and implementation of a system that performs the DFG to hardware translation. The DFG is broken up into three sections: the data generators, the inner loop body, and the data collectors. The second of these, the inner loop body, is used to create a computational structure that is unique for each program. The other two sections are implemented by using prebuilt modules, parameterized for the particular problem. Finally, a "glue module" is created to connect the various pieces into a complete interconnection specification. The dissertation also explores optimizations that can be applied while processing the DFG, to improve performance. A technique for pipelining the inner loop body is described that uses an estimation tool for the propagation delay of the nodes within the dataflow graph. A scheme is also described that identifies subgraphs with the dataflow graph that can be replaced with lookup tables. The lookup tables provide a faster implementation than random logic in some instances

    Rewriting History: Repurposing Domain-Specific CGRAs

    Full text link
    Coarse-grained reconfigurable arrays (CGRAs) are domain-specific devices promising both the flexibility of FPGAs and the performance of ASICs. However, with restricted domains comes a danger: designing chips that cannot accelerate enough current and future software to justify the hardware cost. We introduce FlexC, the first flexible CGRA compiler, which allows CGRAs to be adapted to operations they do not natively support. FlexC uses dataflow rewriting, replacing unsupported regions of code with equivalent operations that are supported by the CGRA. We use equality saturation, a technique enabling efficient exploration of a large space of rewrite rules, to effectively search through the program-space for supported programs. We applied FlexC to over 2,000 loop kernels, compiling to four different research CGRAs and 300 generated CGRAs and demonstrate a 2.2Ă—\times increase in the number of loop kernels accelerated leading to 3Ă—\times speedup compared to an Arm A5 CPU on kernels that would otherwise be unsupported by the accelerator

    FPGA acceleration of DNA sequence alignment: design analysis and optimization

    Get PDF
    Existing FPGA accelerators for short read mapping often fail to utilize the complete biological information in sequencing data for simple hardware design, leading to missed or incorrect alignment. In this work, we propose a runtime reconfigurable alignment pipeline that considers all information in sequencing data for the biologically accurate acceleration of short read mapping. We focus our efforts on accelerating two string matching techniques: FM-index and the Smith-Waterman algorithm with the affine-gap model which are commonly used in short read mapping. We further optimize the FPGA hardware using a design analyzer and merger to improve alignment performance. The contributions of this work are as follows. 1. We accelerate the exact-match and mismatch alignment by leveraging the FM-index technique. We optimize memory access by compressing the data structure and interleaving the access with multiple short reads. The FM-index hardware also considers complete information in the read data to maximize accuracy. 2. We propose a seed-and-extend model to accelerate alignment with indels. The FM-index hardware is extended to support the seeding stage while a Smith-Waterman implementation with the affine-gap model is developed on FPGA for the extension stage. This model can improve the efficiency of indel alignment with comparable accuracy versus state-of-the-art software. 3. We present an approach for merging multiple FPGA designs into a single hardware design, so that multiple place-and-route tasks can be replaced by a single task to speed up functional evaluation of designs. We first experiment with this approach to demonstrate its feasibility for different designs. Then we apply this approach to optimize one of the proposed FPGA aligners for better alignment performance.Open Acces

    Overview of the tool-flow for the Montium Processing Tile

    Get PDF
    This paper presents an overview of a tool chain to support a transformational design methodology. The tool can be used to compile code written in a high level source language, like C, to a coarse grain reconfigurable architecture. The source code is first translated into a Control Data Flow Graph (CDFG). A Control Dataflow Graph contains not only the dataflow operations (e.g. arithmetic or logical operations on data) but also control flow operations (e.g. operators for loop and if then else constructs). The CDFG is minimized using a set of behavior preserving transformations such as dependency analysis, common sub-expression elimination, etc. After applying graph clustering, scheduling and allocation transformations on this minimized graph, it can be mapped onto the target architecture

    LEGaTO: first steps towards energy-efficient toolset for heterogeneous computing

    Get PDF
    LEGaTO is a three-year EU H2020 project which started in December 2017. The LEGaTO project will leverage task-based programming models to provide a software ecosystem for Made-in-Europe heterogeneous hardware composed of CPUs, GPUs, FPGAs and dataflow engines. The aim is to attain one order of magnitude energy savings from the edge to the converged cloud/HPC.Peer ReviewedPostprint (author's final draft

    Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions

    Get PDF
    In the past decade, Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance in various Artificial Intelligence tasks. To accelerate the experimentation and development of CNNs, several software frameworks have been released, primarily targeting power-hungry CPUs and GPUs. In this context, reconfigurable hardware in the form of FPGAs constitutes a potential alternative platform that can be integrated in the existing deep learning ecosystem to provide a tunable balance between performance, power consumption and programmability. In this paper, a survey of the existing CNN-to-FPGA toolflows is presented, comprising a comparative study of their key characteristics which include the supported applications, architectural choices, design space exploration methods and achieved performance. Moreover, major challenges and objectives introduced by the latest trends in CNN algorithmic research are identified and presented. Finally, a uniform evaluation methodology is proposed, aiming at the comprehensive, complete and in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal, 201
    • …
    corecore