2 research outputs found

    Tile size selection for optimized memory reuse in high-level synthesis

    No full text
    High-level synthesis (HLS) is well capable of generating control and computation circuits for FPGA accelerators, but still requires sufficient human effort to tackle the challenge of memory and communication bottlenecks. One important approach for improving data locality is to apply loop tiling on memory-intensive loops. Loop tiling is a well-known compiler technique that partitions the iteration space of a loop nest into chunks (or `tiles') whose associated data can fit into size-constrained fast memory. The size of the tiles, which can significantly affect the memory requirement, is usually determined by partial enumeration. In this paper, we propose an analytical methodology to select a tile size for optimized memory reuse in HLS. A parametric polyhedral model is introduced to capture memory usage analytically for arbitrary tile sizes. To determine the tile size for data reuse in constrained on-chip memory, an algorithm is then developed to optimize over this model, using non-linear solvers to minimize communication overhead. Experimental results on three representative loops show that, compared to random enumeration with the same time budget, our proposed method can produce tile sizes that lead to a 75% average reduction in communication overhead. A case study with real hardware prototyping also demonstrates the benefits of using the proposed tile size selection

    Extracting Data-Level Parallelism in High-Level Synthesis for Reconfigurable Architectures

    Get PDF
    High-Level Synthesis (HLS) tools are a set of algorithms that allow programmers to obtain implementable Hardware Description Language (HDL) code from specifications written high-level, sequential languages such as C, C++, or Java. HLS has allowed programmers to code in their preferred language while still obtaining all the benefits hardware acceleration has to offer without them needing to be intimately familiar with the hardware platform of the accelerator. In this work we summarize and expand upon several of our approaches to improve the automatic memory banking capabilities of HLS tools targeting reconfigurable architectures, namely Field-Programmable Gate Arrays or FPGA\u27s. We explored several approaches to automatically find the optimal partition factor and a usable banking scheme for stencil kernels including a tessellation based approach using multiple families of hyperplanes to do the partitioning which was able to find a better banking factor than current state-of-the-art methods and a graph theory methodology that allowed us to mathematically prove the optimality of our banking solutions. For non-stencil kernels we relaxed some of the conditions in our graph-based model to propose a best-effort solution to arbitrarily reduce memory access conflicts (simultaneous accesses to the same memory bank). We also proposed a non-linear transformation using prime factorization to convert a small subset of non-stencil kernels into stencil memory accesses, allowing us to use all previous work in memory partition to them. Our approaches were able to obtain better results than commercial tools and state-of-the-art algorithms in terms of reduced resource utilization and increased frequency of operation. We were also able to obtain better partition factors for some stencil kernels and usable baking schemes for non-stencil kernels with better performance than any applicable existing algorithm
    corecore