28 research outputs found

    Parametric Multi-Level Tiling of Imperfectly Nested Loops

    Get PDF
    International audienceTiling is a crucial loop transformation for generating high perfor- mance code on modern architectures. Efficient generation of multi- level tiled code is essential for maximizing data reuse in systems with deep memory hierarchies. Tiled loops with parametric tile sizes (not compile-time constants) facilitate runtime feedback and dynamic optimizations used in iterative compilation and automatic tuning. Previous parametric multi-level tiling approaches have been restricted to perfectly nested loops, where all assignment state- ments are contained inside the innermost loop of a loop nest. Pre- vious solutions to tiling for imperfect loop nests have only handled fixed tile sizes. In this paper, we present an approach to paramet- ric multi-level tiling of imperfectly nested loops. The tiling tech- nique generates loops that iterate over full rectangular tiles, making them amenable to compiler optimizations such as register tiling. Experimental results using a number of computational benchmarks demonstrate the effectiveness of the developed tiling approach

    Effective Automatic Parallelization of Stencil Computations

    No full text
    Abstract Performance optimization of stencil computations has beenwidely studied in the literature, since they occur in many computationally intensive scientific and engineering appli-cations. Compiler frameworks have also been developed that can transform sequential stencil codes for optimization ofdata locality and parallelism. However, loop skewing is typically required in order to tile stencil codes along the timedimension, resulting in load imbalance in pipelined parallel execution of the tiles. In this paper, we develop an approachfor automatic parallelization of stencil codes, that explicitly addresses the issue of load-balanced execution of tiles. Ex-perimental results are provided that demonstrate the effectiveness of the approach. Categories and Subject Descriptors D.3.4 [ProgrammingLanguages]: Processors--Compilers, Optimizatio

    Abstract Effective Automatic Parallelization of Stencil Computations

    No full text
    Performance optimization of stencil computations has been widely studied in the literature, since they occur in many computationally intensive scientific and engineering applications. Compiler frameworks have also been developed that can transform sequential stencil codes for optimization of data locality and parallelism. However, loop skewing is typically required in order to tile stencil codes along the time dimension, resulting in load imbalance in pipelined parallel execution of the tiles. In this paper, we develop an approach for automatic parallelization of stencil codes, that explicitly addresses the issue of load-balanced execution of tiles. Experimental results are provided that demonstrate the effectiveness of the approach
    corecore