37 research outputs found

    Pipeline synthesis and optimization for reconfigurable custom computing machines

    Get PDF
    This paper presents a pipeline synthesis and optimization technique for high-level language programming of reconfigurable Custom Computing Machines. The circuit synthesis generates hardware accelerators from a sequential program which exploit the reconfigurable hardware\u27s parallelism. Program loops are transformed to structural hardware specifications. The optimization algorithm uses integer linear programming to balance and pipeline the circuit\u27s registers. This global optimization determines the minimal amount of flip-flops necessary for an optimal pipeline throughput. It also considers the irregular flip-flop distribution on FPGAs. Standard interface circuitry and a runtime system provide the connection between the accelerator unit and its host computer. An integrated compiler invokes the synthesis and produces a program which downloads, calls and controls its hardware accelerators automatically

    Integer Programming for Partitioning in Software Oriented Codesign

    No full text
    This paper presents a new partitioning method for software oriented hardware /software codesign. It is applied to the use of field--programmable accelerator boards. In the underlying model the dedicated hardware has no direct access to the host memory, and communication is slow. Therefore detailed data--flow information is necessary to minimize the communication overhead between host and accelerator board. The partitioning problem is formulated as an integer (linear) program which simultaneously determines which code regions should be implemented in dedicated hardware and which data has to be communicated, so that well--known optimization algorithms can be applied. 1 Introduction The goal of software oriented hardware/software codesign is to improve system performance by moving time-critical parts of a program to hardware. This approach is attracting increasing interest since it uses simple specifications in common programming languages and allows the use of automatically designed ded..

    High-Level Synthesis Oriented Restructuring of Functions with While Loops

    No full text
    The usage of high-level synthesis (HLS) tools for FPGAs has increased significantly over the last years since they matured and allow software programmers to take advantage of reconfigurable hardware technology. Most HLS tools employ methods to optimize for loops, e. g. by unrolling or pipelining them. But there is hardly any work on the optimization of while loops. This comes at no surprise since most while loops have loop-carried dependences involving the loop condition which result in large recurrence cycles in the dataflow graphs. Therefore typical while loops cannot be parallelized or pipelined. We propose a novel transformation which allows to optimize while loops nested within a for loop. By interchanging the two loops, it is possible to pipeline (and thereby parallelize) the inner loop, resulting in a reduced execution time. We present two case studies on different hardware platforms and show the speedup factors - compared to a host processor and to an unoptimized hardware implementation - achieved by our while loop optimization method

    High-Level Synthesis Oriented Restructuring of Functions with While Loops

    No full text
    The usage of high-level synthesis (HLS) tools for FPGAs has increased significantly over the last years since they matured and allow software programmers to take advantage of reconfigurable hardware technology. Most HLS tools employ methods to optimize for loops, e. g. by unrolling or pipelining them. But there is hardly any work on the optimization of while loops. This comes at no surprise since most while loops have loop-carried dependences involving the loop condition which result in large recurrence cycles in the dataflow graphs. Therefore typical while loops cannot be parallelized or pipelined. We propose a novel transformation which allows to optimize while loops nested within a for loop. By interchanging the two loops, it is possible to pipeline (and thereby parallelize) the inner loop, resulting in a reduced execution time. We present two case studies on different hardware platforms and show the speedup factors - compared to a host processor and to an unoptimized hardware implementation - achieved by our while loop optimization method
    corecore