3,705 research outputs found

    Near-optimal loop tiling by means of cache miss equations and genetic algorithms

    Get PDF
    The effectiveness of the memory hierarchy is critical for the performance of current processors. The performance of the memory hierarchy can be improved by means of program transformations such as loop tiling, which is a code transformation targeted to reduce capacity misses. This paper presents a novel systematic approach to perform near-optimal loop tiling based on an accurate data locality analysis (cache miss equations) and a powerful technique to search the solution space that is based on a genetic algorithm. The results show that this approach can remove practically all capacity misses for all considered benchmarks. The reduction of replacement misses results in a decrease of the miss ratio that can be as significant as a factor of 7 for the matrix multiply kernel.Peer ReviewedPostprint (published version

    AutoAccel: Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture

    Full text link
    CPU-FPGA heterogeneous architectures are attracting ever-increasing attention in an attempt to advance computational capabilities and energy efficiency in today's datacenters. These architectures provide programmers with the ability to reprogram the FPGAs for flexible acceleration of many workloads. Nonetheless, this advantage is often overshadowed by the poor programmability of FPGAs whose programming is conventionally a RTL design practice. Although recent advances in high-level synthesis (HLS) significantly improve the FPGA programmability, it still leaves programmers facing the challenge of identifying the optimal design configuration in a tremendous design space. This paper aims to address this challenge and pave the path from software programs towards high-quality FPGA accelerators. Specifically, we first propose the composable, parallel and pipeline (CPP) microarchitecture as a template of accelerator designs. Such a well-defined template is able to support efficient accelerator designs for a broad class of computation kernels, and more importantly, drastically reduce the design space. Also, we introduce an analytical model to capture the performance and resource trade-offs among different design configurations of the CPP microarchitecture, which lays the foundation for fast design space exploration. On top of the CPP microarchitecture and its analytical model, we develop the AutoAccel framework to make the entire accelerator generation automated. AutoAccel accepts a software program as an input and performs a series of code transformations based on the result of the analytical-model-based design space exploration to construct the desired CPP microarchitecture. Our experiments show that the AutoAccel-generated accelerators outperform their corresponding software implementations by an average of 72x for a broad class of computation kernels

    Nodal domains of the equilateral triangle billiard

    Full text link
    We characterise the eigenfunctions of an equilateral triangle billiard in terms of its nodal domains. The number of nodal domains has a quadratic form in terms of the quantum numbers, with a non-trivial number-theoretic factor. The patterns of the eigenfunctions follow a group-theoretic connection in a way that makes them predictable as one goes from one state to another. Extensive numerical investigations bring out the distribution functions of the mode number and signed areas. The statistics of the boundary intersections is also treated analytically. Finally, the distribution functions of the nodal loop count and the nodal counting function are shown to contain information about the classical periodic orbits using the semiclassical trace formula. We believe that the results belong generically to non-separable systems, thus extending the previous works which are concentrated on separable and chaotic systems.Comment: 26 pages, 13 figure

    Refactoring intermediately executed code to reduce cache capacity misses

    Get PDF
    The growing memory wall requires that more attention is given to the data cache behavior of programs. In this paper, attention is given to the capacity misses i.e. the misses that occur because the cache size is smaller than the data footprint between the use and the reuse of the same data. The data footprint is measured with the reuse distance metric, by counting the distinct memory locations accessed between use and reuse. For reuse distances larger than the cache size, the associated code needs to be refactored in a way that reduces the reuse distance to below the cache size so that the capacity misses are eliminated. In a number of simple loops, the reuse distance can be calculated analytically. However, in most cases profiling is needed to pinpoint the areas where the program needs to be transformed for better data locality. This is achieved by the reuse distance visualizer, RDVIS, which shows the intermediately executed code for critical data reuses. In addition, another tool, SLO, annotates the source program with suggestions for locality ptimization. Both tools have been used to analyze and to refactor a number of SPEC2000 benchmark programs with very positive results

    Domino tilings and the six-vertex model at its free fermion point

    Full text link
    At the free-fermion point, the six-vertex model with domain wall boundary conditions (DWBC) can be related to the Aztec diamond, a domino tiling problem. We study the mapping on the level of complete statistics for general domains and boundary conditions. This is obtained by associating to both models a set of non-intersecting lines in the Lindstroem-Gessel-Viennot (LGV) scheme. One of the consequence for DWBC is that the boundaries of the ordered phases are described by the Airy process in the thermodynamic limit.Comment: 14 pages, 8 figure

    Open boundary Quantum Knizhnik-Zamolodchikov equation and the weighted enumeration of Plane Partitions with symmetries

    Full text link
    We propose new conjectures relating sum rules for the polynomial solution of the qKZ equation with open (reflecting) boundaries as a function of the quantum parameter qq and the τ\tau-enumeration of Plane Partitions with specific symmetries, with τ=(q+q1)\tau=-(q+q^{-1}). We also find a conjectural relation \`a la Razumov-Stroganov between the τ0\tau\to 0 limit of the qKZ solution and refined numbers of Totally Symmetric Self Complementary Plane Partitions.Comment: 27 pages, uses lanlmac, epsf and hyperbasics, minor revision
    corecore