21,302 research outputs found
A Survey on Compiler Autotuning using Machine Learning
Since the mid-1990s, researchers have been trying to use machine-learning
based approaches to solve a number of different compiler optimization problems.
These techniques primarily enhance the quality of the obtained results and,
more importantly, make it feasible to tackle two main compiler optimization
problems: optimization selection (choosing which optimizations to apply) and
phase-ordering (choosing the order of applying optimizations). The compiler
optimization space continues to grow due to the advancement of applications,
increasing number of compiler optimizations, and new target architectures.
Generic optimization passes in compilers cannot fully leverage newly introduced
optimizations and, therefore, cannot keep up with the pace of increasing
options. This survey summarizes and classifies the recent advances in using
machine learning for the compiler optimization field, particularly on the two
major problems of (1) selecting the best optimizations and (2) the
phase-ordering of optimizations. The survey highlights the approaches taken so
far, the obtained results, the fine-grain classification among different
approaches and finally, the influential papers of the field.Comment: version 5.0 (updated on September 2018)- Preprint Version For our
Accepted Journal @ ACM CSUR 2018 (42 pages) - This survey will be updated
quarterly here (Send me your new published papers to be added in the
subsequent version) History: Received November 2016; Revised August 2017;
Revised February 2018; Accepted March 2018
Loo.py: transformation-based code generation for GPUs and CPUs
Today's highly heterogeneous computing landscape places a burden on
programmers wanting to achieve high performance on a reasonably broad
cross-section of machines. To do so, computations need to be expressed in many
different but mathematically equivalent ways, with, in the worst case, one
variant per target machine.
Loo.py, a programming system embedded in Python, meets this challenge by
defining a data model for array-style computations and a library of
transformations that operate on this model. Offering transformations such as
loop tiling, vectorization, storage management, unrolling, instruction-level
parallelism, change of data layout, and many more, it provides a convenient way
to capture, parametrize, and re-unify the growth among code variants. Optional,
deep integration with numpy and PyOpenCL provides a convenient computing
environment where the transition from prototype to high-performance
implementation can occur in a gradual, machine-assisted form
Towards an Achievable Performance for the Loop Nests
Numerous code optimization techniques, including loop nest optimizations,
have been developed over the last four decades. Loop optimization techniques
transform loop nests to improve the performance of the code on a target
architecture, including exposing parallelism. Finding and evaluating an
optimal, semantic-preserving sequence of transformations is a complex problem.
The sequence is guided using heuristics and/or analytical models and there is
no way of knowing how close it gets to optimal performance or if there is any
headroom for improvement. This paper makes two contributions. First, it uses a
comparative analysis of loop optimizations/transformations across multiple
compilers to determine how much headroom may exist for each compiler. And
second, it presents an approach to characterize the loop nests based on their
hardware performance counter values and a Machine Learning approach that
predicts which compiler will generate the fastest code for a loop nest. The
prediction is made for both auto-vectorized, serial compilation and for
auto-parallelization. The results show that the headroom for state-of-the-art
compilers ranges from 1.10x to 1.42x for the serial code and from 1.30x to
1.71x for the auto-parallelized code. These results are based on the Machine
Learning predictions.Comment: Accepted at the 31st International Workshop on Languages and
Compilers for Parallel Computing (LCPC 2018
Less is More: Exploiting the Standard Compiler Optimization Levels for Better Performance and Energy Consumption
This paper presents the interesting observation that by performing fewer of
the optimizations available in a standard compiler optimization level such as
-O2, while preserving their original ordering, significant savings can be
achieved in both execution time and energy consumption. This observation has
been validated on two embedded processors, namely the ARM Cortex-M0 and the ARM
Cortex-M3, using two different versions of the LLVM compilation framework; v3.8
and v5.0. Experimental evaluation with 71 embedded benchmarks demonstrated
performance gains for at least half of the benchmarks for both processors. An
average execution time reduction of 2.4% and 5.3% was achieved across all the
benchmarks for the Cortex-M0 and Cortex-M3 processors, respectively, with
execution time improvements ranging from 1% up to 90% over the -O2. The savings
that can be achieved are in the same range as what can be achieved by the
state-of-the-art compilation approaches that use iterative compilation or
machine learning to select flags or to determine phase orderings that result in
more efficient code. In contrast to these time consuming and expensive to apply
techniques, our approach only needs to test a limited number of optimization
configurations, less than 64, to obtain similar or even better savings.
Furthermore, our approach can support multi-criteria optimization as it targets
execution time, energy consumption and code size at the same time.Comment: 15 pages, 3 figures, 71 benchmarks used for evaluatio
Linear Convergence of Comparison-based Step-size Adaptive Randomized Search via Stability of Markov Chains
In this paper, we consider comparison-based adaptive stochastic algorithms
for solving numerical optimisation problems. We consider a specific subclass of
algorithms that we call comparison-based step-size adaptive randomized search
(CB-SARS), where the state variables at a given iteration are a vector of the
search space and a positive parameter, the step-size, typically controlling the
overall standard deviation of the underlying search distribution.We investigate
the linear convergence of CB-SARS on\emph{scaling-invariant} objective
functions. Scaling-invariantfunctions preserve the ordering of points with
respect to their functionvalue when the points are scaled with the same
positive parameter (thescaling is done w.r.t. a fixed reference point). This
class offunctions includes norms composed with strictly increasing functions
aswell as many non quasi-convex and non-continuousfunctions. On
scaling-invariant functions, we show the existence of ahomogeneous Markov
chain, as a consequence of natural invarianceproperties of CB-SARS (essentially
scale-invariance and invariance tostrictly increasing transformation of the
objective function). We thenderive sufficient conditions for \emph{global
linear convergence} ofCB-SARS, expressed in terms of different stability
conditions of thenormalised homogeneous Markov chain (irreducibility,
positivity, Harrisrecurrence, geometric ergodicity) and thus define a general
methodologyfor proving global linear convergence of CB-SARS algorithms
onscaling-invariant functions. As a by-product we provide aconnexion between
comparison-based adaptive stochasticalgorithms and Markov chain Monte Carlo
algorithms.Comment: SIAM Journal on Optimization, Society for Industrial and Applied
Mathematics, 201
Building Efficient Query Engines in a High-Level Language
Abstraction without regret refers to the vision of using high-level
programming languages for systems development without experiencing a negative
impact on performance. A database system designed according to this vision
offers both increased productivity and high performance, instead of sacrificing
the former for the latter as is the case with existing, monolithic
implementations that are hard to maintain and extend. In this article, we
realize this vision in the domain of analytical query processing. We present
LegoBase, a query engine written in the high-level language Scala. The key
technique to regain efficiency is to apply generative programming: LegoBase
performs source-to-source compilation and optimizes the entire query engine by
converting the high-level Scala code to specialized, low-level C code. We show
how generative programming allows to easily implement a wide spectrum of
optimizations, such as introducing data partitioning or switching from a row to
a column data layout, which are difficult to achieve with existing low-level
query compilers that handle only queries. We demonstrate that sufficiently
powerful abstractions are essential for dealing with the complexity of the
optimization effort, shielding developers from compiler internals and
decoupling individual optimizations from each other. We evaluate our approach
with the TPC-H benchmark and show that: (a) With all optimizations enabled,
LegoBase significantly outperforms a commercial database and an existing query
compiler. (b) Programmers need to provide just a few hundred lines of
high-level code for implementing the optimizations, instead of complicated
low-level code that is required by existing query compilation approaches. (c)
The compilation overhead is low compared to the overall execution time, thus
making our approach usable in practice for compiling query engines
Deductive Optimization of Relational Data Storage
Optimizing the physical data storage and retrieval of data are two key
database management problems. In this paper, we propose a language that can
express a wide range of physical database layouts, going well beyond the row-
and column-based methods that are widely used in database management systems.
We use deductive synthesis to turn a high-level relational representation of a
database query into a highly optimized low-level implementation which operates
on a specialized layout of the dataset. We build a compiler for this language
and conduct experiments using a popular database benchmark, which shows that
the performance of these specialized queries is competitive with a
state-of-the-art in memory compiled database system
- …