93 research outputs found

    Towards an Achievable Performance for the Loop Nests

    Full text link
    Numerous code optimization techniques, including loop nest optimizations, have been developed over the last four decades. Loop optimization techniques transform loop nests to improve the performance of the code on a target architecture, including exposing parallelism. Finding and evaluating an optimal, semantic-preserving sequence of transformations is a complex problem. The sequence is guided using heuristics and/or analytical models and there is no way of knowing how close it gets to optimal performance or if there is any headroom for improvement. This paper makes two contributions. First, it uses a comparative analysis of loop optimizations/transformations across multiple compilers to determine how much headroom may exist for each compiler. And second, it presents an approach to characterize the loop nests based on their hardware performance counter values and a Machine Learning approach that predicts which compiler will generate the fastest code for a loop nest. The prediction is made for both auto-vectorized, serial compilation and for auto-parallelization. The results show that the headroom for state-of-the-art compilers ranges from 1.10x to 1.42x for the serial code and from 1.30x to 1.71x for the auto-parallelized code. These results are based on the Machine Learning predictions.Comment: Accepted at the 31st International Workshop on Languages and Compilers for Parallel Computing (LCPC 2018

    Bouclettes: A Fortran loop parallelizer

    No full text

    Extended Lattice-Based Memory Allocation

    No full text
    International audienceThis work extends lattice-based memory allocation, an earlier work on memory reuse through array contraction. Such an optimization is used for optimizing high-level programming languages where storage mapping may be abstracted away from programmers and to complement code transformations that introduce intermediate buffers. The main motivation for this extension is to improve the handling of more general forms of specifications we see today, e.g., with loop tiling, pipelining, and other forms of parallelism available in explicitly-parallel languages. Specifically, we handle the case when conflicting constraints (those that describe the array indices that cannot share the same location) are specified as a (non-convex) union of polyhedra. The choice of directions (or basis) of array reuse becomes important when dealing with non-convex specifications. We extend the two dual approaches in the original work to handle unions of polyhedra, and to select a suitable basis. Our final approach relies on a combination of the two, also revealing their links with, on one hand, the construction of multi-dimensional schedules for parallelism and tiling (but with a fundamental difference that we identify) and, on the other hand, the construction of universal reuse vectors (UOV), which was only used so far in a specific context, for schedule-independent mapping

    Tiling and memory reuse for sequences of nested loops

    No full text
    International audienceOur aim is to minimize the electrical energy used during the execution of signal processing applications that are a sequence of loop nests. This energy is mostly used to transfer data among various levels of memory hierarchy. To minimize these transfers, we transform these programs by using simultaneously loop permutation, tiling, loop fusion with shifting and memory reuse. Each input nest uses a stencil of data produced in the previous nest and the references to the same array are equal, up to a shift. All transformations described in this paper have been implemented in pips, our optimizing compiler and cache misses reductions have been measured

    Distribution of Nad Glycohydrolase Among Liver-cells

    No full text

    Circuit retiming applied to decomposed software pipelining

    No full text
    • …
    corecore