3 research outputs found

    Optimizing Program Efficiency with Loop Unroll Factor Prediction

    Get PDF
    Loop unrolling is a well-established code transformation technique that can improve the performance of a program at runtime. The key benefit of unrolling a loop is that it often requires fewer instruction executions than the original loop. However, determining the optimal number of loop unrolling is a critical concern. This paper presents a novel method for predicting the optimal unroll factor for a given program. Specifically, a dataset is constructed that includes the execution times of several programs with varying loop unroll factors. The programs are sourced from different benchmarks, such as Ploybench, Shooutout, and other programs. Similarity measures between the unseen program and the existing programs are computed, and the three most similar programs are identified. The unroll factor that led to the greatest reduction in execution time for the most similar programs is selected as the candidate for the unseen program. Experimental results demonstrate that the proposed method can enhance the performance of training programs for unroll factors of 2, 4, 6, and 8 by approximately 13%, 18%, 19%, and 21%, respectively. For the unseen programs, the speedup rate is approximately 37.7% for five programs

    Autotuning the Intel HLS Compiler using the Opentuner Framework

    Get PDF
    High level synthesis (HLS) tools can be used to improve design flow and decrease verification times for field programmable gate array (FPGA) and application specific integrated circuit (ASIC) design. The Intel HLS Compiler is a high level synthesis tool that takes in untimed C/C++ as input and generates production-quality register transfer level (RTL) code that is optimized for Intel FPGAs. The translation does, however, require multiple iterations and manual optimizations to get comparable synthesized results to that of a solution written in a hardware descriptive language. The synthesis results can vary greatly based upon coding style and optimization techniques, and typically require an in-depth knowledge of FPGAs to fully optimize the translation which limits the audience of the tool. The extra abstraction that the C/C++ source code presents can also make it difficult to meet more specific design requirements; this includes designs to meet specific resource usage or performance based metrics. To improve the quality of results generated by the Intel HLS Compiler without a manual iterative process that requires an in-depth knowledge of FPGAs, this research proposes a method of automating some of the optimization techniques that improve the synthesized design through an autotuning process. The proposed approach utilizes the PyCParser library to parse C source files and the OpenTuner Framework to autotune the synthesis to provide a method that generates results that better meet the needs of the designer's requirements through lower FPGA resource usage or increased design performance. Such functionality is not currently available in Intel's commercial tools. The proposed approach was tested with the CHStone Benchmarking Suite of C programs as well as a standard digital signal processing finite impulse response filter. The results show that the commercial HLS tool can be automatically autotuned through placeholder injection using a source parsing tool for C code and using the OpenTuner Framework to autotune the results. For designs that are small in nature and include conducive structures to be autotuned, the results indicate resource usage reductions and/or performance increases of up to 40% as compared to the default Intel HLS Compiler results. The method developed in this research also allows additional design targets to be specified through the autotuner for consideration in the synthesized design which can yield results that are better matched to a design's requirements
    corecore