3 research outputs found

    Predictive Modeling in a Polyhedral Optimization Space

    Get PDF
    International audienceHigh-level program optimizations, such as loop transformations, are critical for high performance on multi-core targets. However, complex sequences of loop transformations are often required to expose parallelism (both coarse-grain and fine-grain) and improve data locality. The polyhedral compilation framework has proved to be very effective at representing these complex sequences and restructuring compute-intensive applications, seamlessly handling perfectly and imperfectly nested loops. Nevertheless identifying the most effective loop transformations remains a major challenge. We address the problem of selecting the best polyhedral optimizations with dedicated machine learning models, trained specifically on the target machine. We show that these models can quickly select high-performance optimizations with very limited iterative search. Our end-to-end framework is validated using numerous benchmarks on two modern multi-core platforms. We investigate a variety of different machine learning algorithms and hardware counters, and we obtain performance improvements over productions compilers ranging on average from 3.2x to 8.7x, by running not more than 6 program variants from a polyhedral optimization space

    Iterative Schedule Optimization for Parallelization in the Polyhedron Model

    Get PDF
    In high-performance computing, one primary objective is to exploit the performance that the given target hardware can deliver to the fullest. Compilers that have the ability to automatically optimize programs for a specific target hardware can be highly useful in this context. Iterative (or search-based) compilation requires little or no prior knowledge and can adapt more easily to concrete programs and target hardware than static cost models and heuristics. Thereby, iterative compilation helps in situations in which static heuristics do not reflect the combination of input program and target hardware well. Moreover, iterative compilation may enable the derivation of more accurate cost models and heuristics for optimizing compilers. In this context, the polyhedron model is of help as it provides not only a mathematical representation of programs but, more importantly, a uniform representation of complex sequences of program transformations by schedule functions. The latter facilitates the systematic exploration of the set of legal transformations of a given program. Early approaches to purely iterative schedule optimization in the polyhedron model do not limit their search to schedules that preserve program semantics and, thereby, suffer from the need to explore numbers of illegal schedules. More recent research ensures the legality of program transformations but presumes a sequential rather than a parallel execution of the transformed program. Other approaches do not perform a purely iterative optimization. We propose an approach to iterative schedule optimization for parallelization and tiling in the polyhedron model. Our approach targets loop programs that profit from data locality optimization and coarse-grained loop parallelization. The schedule search space can be explored either randomly or by means of a genetic algorithm. To determine a schedule's profitability, we rely primarily on measuring the transformed code's execution time. While benchmarking is accurate, it increases the time and resource consumption of program optimization tremendously and can even make it impractical. We address this limitation by proposing to learn surrogate models from schedules generated and evaluated in previous runs of the iterative optimization and to replace benchmarking by performance prediction to the extent possible. Our evaluation on the PolyBench 4.1 benchmark set reveals that, in a given setting, iterative schedule optimization yields significantly higher speedups in the execution of the program to be optimized. Surrogate performance models learned from training data that was generated during previous iterative optimizations can reduce the benchmarking effort without strongly impairing the optimization result. A prerequisite for this approach is a sufficient similarity between the training programs and the program to be optimized

    A Heuristic Search Algorithm Based on Unified Transformation Framework

    No full text
    Modern compilers have limited ability to exploit the performance improvement potential of complex transformation compositions. This is due to the ad-hoc nature of different transformations. Various frameworks have been proposed to provide a unified representation of different transformations, among them is Pugh's Unified Transformation Framework (UTF)[10]. It presents a unified and systematic representation of iteration reordering transformations and their arbitrary combination, which results in a large and complex optimisation space for a compiler to explore. This paper presents a heuristic search algorithm capable of efficiently locating good program optimisations within such a space. Preliminary experimental results on Java show that it can achieve an average speedup of 1.14 on Linux+Celeron and 1.10 on Windows+PentiumPro, and more than 75% of the maximum performance available can be obtained within 20 evaluations or less
    corecore