3 research outputs found

    Autotuning the Intel HLS Compiler using the Opentuner Framework

    Get PDF
    High level synthesis (HLS) tools can be used to improve design flow and decrease verification times for field programmable gate array (FPGA) and application specific integrated circuit (ASIC) design. The Intel HLS Compiler is a high level synthesis tool that takes in untimed C/C++ as input and generates production-quality register transfer level (RTL) code that is optimized for Intel FPGAs. The translation does, however, require multiple iterations and manual optimizations to get comparable synthesized results to that of a solution written in a hardware descriptive language. The synthesis results can vary greatly based upon coding style and optimization techniques, and typically require an in-depth knowledge of FPGAs to fully optimize the translation which limits the audience of the tool. The extra abstraction that the C/C++ source code presents can also make it difficult to meet more specific design requirements; this includes designs to meet specific resource usage or performance based metrics. To improve the quality of results generated by the Intel HLS Compiler without a manual iterative process that requires an in-depth knowledge of FPGAs, this research proposes a method of automating some of the optimization techniques that improve the synthesized design through an autotuning process. The proposed approach utilizes the PyCParser library to parse C source files and the OpenTuner Framework to autotune the synthesis to provide a method that generates results that better meet the needs of the designer's requirements through lower FPGA resource usage or increased design performance. Such functionality is not currently available in Intel's commercial tools. The proposed approach was tested with the CHStone Benchmarking Suite of C programs as well as a standard digital signal processing finite impulse response filter. The results show that the commercial HLS tool can be automatically autotuned through placeholder injection using a source parsing tool for C code and using the OpenTuner Framework to autotune the results. For designs that are small in nature and include conducive structures to be autotuned, the results indicate resource usage reductions and/or performance increases of up to 40% as compared to the default Intel HLS Compiler results. The method developed in this research also allows additional design targets to be specified through the autotuner for consideration in the synthesized design which can yield results that are better matched to a design's requirements

    BaCO: A Fast and Portable Bayesian Compiler Optimization Framework

    Full text link
    We introduce the Bayesian Compiler Optimization framework (BaCO), a general purpose autotuner for modern compilers targeting CPUs, GPUs, and FPGAs. BaCO provides the flexibility needed to handle the requirements of modern autotuning tasks. Particularly, it deals with permutation, ordered, and continuous parameter types along with both known and unknown parameter constraints. To reason about these parameter types and efficiently deliver high-quality code, BaCO uses Bayesian optimiza tion algorithms specialized towards the autotuning domain. We demonstrate BaCO's effectiveness on three modern compiler systems: TACO, RISE & ELEVATE, and HPVM2FPGA for CPUs, GPUs, and FPGAs respectively. For these domains, BaCO outperforms current state-of-the-art autotuners by delivering on average 1.36x-1.56x faster code with a tiny search budget, and BaCO is able to reach expert-level performance 2.9x-3.9x faster

    Using Reduced Graphs for Efficient HLS Scheduling

    Get PDF
    High-Level Synthesis (HLS) is the process of inferring a digital circuit from a high-level algorithmic description provided as a software implementation, usually in C/C++. HLS tools will parse the input code and then perform three main steps: allocation, scheduling, and binding. This results in a hardware architecture which can then be represented as a Register-Transfer Level (RTL) model using a Hardware Description Language (HDL), such as VHDL or Verilog. Allocation determines the amount of resources needed, scheduling finds the order in which operations should occur, and binding maps operations onto the allocated hardware resources. Two main challenges of scheduling are in its computational complexity and memory requirements. Finding an optimal schedule is an NP-hard problem, so many tools use elaborate heuristics to find a solution which satisfies prescribed implementation constraints. These heuristics require the Control/Data Flow Graph (CDFG), a representation of all operations and their dependencies, which must be stored in its entirety and therefore use large amounts of memory. This thesis presents a new scheduling approach for use in the HLS tool chain. The new technique schedules operations using an algorithm which operates on a reduced representation of the graph, which does not need to retain individual dependency information in order to generate a schedule. By using the simplified graph, the complexity of scheduling is significantly reduced, resulting in improved memory usage and lower computational effort. This new scheduler is implemented and compared to the existing scheduler in the open source version of the LegUp HLS tool. The results demonstrate that an average of 16 times speedup on the time required to determine the schedule can be achieved, with just a fraction of the memory usage (1/5 on average). All of this is achieved with 0 to 6% of added cost on the final hardware execution time
    corecore