6,016 research outputs found
A Survey on Compiler Autotuning using Machine Learning
Since the mid-1990s, researchers have been trying to use machine-learning
based approaches to solve a number of different compiler optimization problems.
These techniques primarily enhance the quality of the obtained results and,
more importantly, make it feasible to tackle two main compiler optimization
problems: optimization selection (choosing which optimizations to apply) and
phase-ordering (choosing the order of applying optimizations). The compiler
optimization space continues to grow due to the advancement of applications,
increasing number of compiler optimizations, and new target architectures.
Generic optimization passes in compilers cannot fully leverage newly introduced
optimizations and, therefore, cannot keep up with the pace of increasing
options. This survey summarizes and classifies the recent advances in using
machine learning for the compiler optimization field, particularly on the two
major problems of (1) selecting the best optimizations and (2) the
phase-ordering of optimizations. The survey highlights the approaches taken so
far, the obtained results, the fine-grain classification among different
approaches and finally, the influential papers of the field.Comment: version 5.0 (updated on September 2018)- Preprint Version For our
Accepted Journal @ ACM CSUR 2018 (42 pages) - This survey will be updated
quarterly here (Send me your new published papers to be added in the
subsequent version) History: Received November 2016; Revised August 2017;
Revised February 2018; Accepted March 2018
Fine-Grain Iterative Compilation for WCET Estimation
Compiler optimizations, although reducing the execution times of programs, raise issues in static WCET estimation techniques and tools. Flow facts, such as loop bounds, may not be automatically found by static WCET analysis tools after aggressive code optimizations. In this paper, we explore the use of iterative compilation (WCET-directed program optimization to explore the optimization space), with the objective to (i) allow flow facts to be automatically found and (ii) select optimizations that result in the lowest WCET estimates. We also explore to which extent code outlining helps, by allowing the selection of different optimization options for different code snippets of the application
Towards Automatic Learning of Heuristics for Mechanical Transformations of Procedural Code
The current trend in next-generation exascale systems goes towards
integrating a wide range of specialized (co-)processors into traditional
supercomputers. However, the integration of different specialized devices
increases the degree of heterogeneity and the complexity in programming such
type of systems. Due to the efficiency of heterogeneous systems in terms of
Watt and FLOPS per surface unit, opening the access of heterogeneous platforms
to a wider range of users is an important problem to be tackled. In order to
bridge the gap between heterogeneous systems and programmers, in this paper we
propose a machine learning-based approach to learn heuristics for defining
transformation strategies of a program transformation system. Our approach
proposes a novel combination of reinforcement learning and classification
methods to efficiently tackle the problems inherent to this type of systems.
Preliminary results demonstrate the suitability of the approach for easing the
programmability of heterogeneous systems.Comment: Part of the Program Transformation for Programmability in
Heterogeneous Architectures (PROHA) workshop, Barcelona, Spain, 12th March
2016, 9 pages, LaTe
A methodology for speeding up loop kernels by exploiting the software information and the memory architecture
It is well-known that today׳s compilers and state of the art libraries have three major drawbacks. First, the compiler sub-problems are optimized separately; this is not efficient because the separate sub-problems optimization gives a different schedule for each sub-problem and these schedules cannot coexist as the refining of one, causes the degradation of another. Second, they take into account only part of the specific algorithm׳s information. Third, they take into account only a few hardware architecture parameters. These approaches cannot give an optimal solution.
In this paper, a new methodology/pre-compiler is introduced, which speeds up loop kernels, by overcoming the above problems. This methodology solves four of the major scheduling sub-problems, together as one problem and not separately; these are the sub-problems of finding the schedules with the minimum numbers of (i) L1 data cache accesses, (ii) L2 data cache accesses, (iii) main memory data accesses, (iv) addressing instructions. First, the exploration space (possible solutions) is found according to the algorithm׳s information, e.g. array subscripts. Then, the exploration space is decreased by orders of magnitude, by applying constraint propagation to the software and hardware parameters.
We take the C-code and the memory architecture parameters as input and we automatically produce a new faster C-code; this code cannot be obtained by applying the existing compiler transformations to the original code. The proposed methodology has been evaluated for five well-known algorithms in both general and embedded processors; it is compared with gcc and clang compilers and also with iterative compilation
- …