94,200 research outputs found
A Survey on Compiler Autotuning using Machine Learning
Since the mid-1990s, researchers have been trying to use machine-learning
based approaches to solve a number of different compiler optimization problems.
These techniques primarily enhance the quality of the obtained results and,
more importantly, make it feasible to tackle two main compiler optimization
problems: optimization selection (choosing which optimizations to apply) and
phase-ordering (choosing the order of applying optimizations). The compiler
optimization space continues to grow due to the advancement of applications,
increasing number of compiler optimizations, and new target architectures.
Generic optimization passes in compilers cannot fully leverage newly introduced
optimizations and, therefore, cannot keep up with the pace of increasing
options. This survey summarizes and classifies the recent advances in using
machine learning for the compiler optimization field, particularly on the two
major problems of (1) selecting the best optimizations and (2) the
phase-ordering of optimizations. The survey highlights the approaches taken so
far, the obtained results, the fine-grain classification among different
approaches and finally, the influential papers of the field.Comment: version 5.0 (updated on September 2018)- Preprint Version For our
Accepted Journal @ ACM CSUR 2018 (42 pages) - This survey will be updated
quarterly here (Send me your new published papers to be added in the
subsequent version) History: Received November 2016; Revised August 2017;
Revised February 2018; Accepted March 2018
Polly's Polyhedral Scheduling in the Presence of Reductions
The polyhedral model provides a powerful mathematical abstraction to enable
effective optimization of loop nests with respect to a given optimization goal,
e.g., exploiting parallelism. Unexploited reduction properties are a frequent
reason for polyhedral optimizers to assume parallelism prohibiting dependences.
To our knowledge, no polyhedral loop optimizer available in any production
compiler provides support for reductions. In this paper, we show that
leveraging the parallelism of reductions can lead to a significant performance
increase. We give a precise, dependence based, definition of reductions and
discuss ways to extend polyhedral optimization to exploit the associativity and
commutativity of reduction computations. We have implemented a
reduction-enabled scheduling approach in the Polly polyhedral optimizer and
evaluate it on the standard Polybench 3.2 benchmark suite. We were able to
detect and model all 52 arithmetic reductions and achieve speedups up to
2.21 on a quad core machine by exploiting the multidimensional
reduction in the BiCG benchmark.Comment: Presented at the IMPACT15 worksho
Lost in translation: Exposing hidden compiler optimization opportunities
Existing iterative compilation and machine-learning-based optimization
techniques have been proven very successful in achieving better optimizations
than the standard optimization levels of a compiler. However, they were not
engineered to support the tuning of a compiler's optimizer as part of the
compiler's daily development cycle. In this paper, we first establish the
required properties which a technique must exhibit to enable such tuning. We
then introduce an enhancement to the classic nightly routine testing of
compilers which exhibits all the required properties, and thus, is capable of
driving the improvement and tuning of the compiler's common optimizer. This is
achieved by leveraging resource usage and compilation information collected
while systematically exploiting prefixes of the transformations applied at
standard optimization levels. Experimental evaluation using the LLVM v6.0.1
compiler demonstrated that the new approach was able to reveal hidden
cross-architecture and architecture-dependent potential optimizations on two
popular processors: the Intel i5-6300U and the Arm Cortex-A53-based Broadcom
BCM2837 used in the Raspberry Pi 3B+. As a case study, we demonstrate how the
insights from our approach enabled us to identify and remove a significant
shortcoming of the CFG simplification pass of the LLVM v6.0.1 compiler.Comment: 31 pages, 7 figures, 2 table. arXiv admin note: text overlap with
arXiv:1802.0984
Mechanized semantics
The goal of this lecture is to show how modern theorem provers---in this
case, the Coq proof assistant---can be used to mechanize the specification of
programming languages and their semantics, and to reason over individual
programs and over generic program transformations, as typically found in
compilers. The topics covered include: operational semantics (small-step,
big-step, definitional interpreters); a simple form of denotational semantics;
axiomatic semantics and Hoare logic; generation of verification conditions,
with application to program proof; compilation to virtual machine code and its
proof of correctness; an example of an optimizing program transformation (dead
code elimination) and its proof of correctness
Optimization of micropillar sequences for fluid flow sculpting
Inertial fluid flow deformation around pillars in a microchannel is a new
method for controlling fluid flow. Sequences of pillars have been shown to
produce a rich phase space with a wide variety of flow transformations.
Previous work has successfully demonstrated manual design of pillar sequences
to achieve desired transformations of the flow cross-section, with experimental
validation. However, such a method is not ideal for seeking out complex
sculpted shapes as the search space quickly becomes too large for efficient
manual discovery. We explore fast, automated optimization methods to solve this
problem. We formulate the inertial flow physics in microchannels with different
micropillar configurations as a set of state transition matrix operations.
These state transition matrices are constructed from experimentally validated
streamtraces. This facilitates modeling the effect of a sequence of
micropillars as nested matrix-matrix products, which have very efficient
numerical implementations. With this new forward model, arbitrary micropillar
sequences can be rapidly simulated with various inlet configurations, allowing
optimization routines quick access to a large search space. We integrate this
framework with the genetic algorithm and showcase its applicability by
designing micropillar sequences for various useful transformations. We
computationally discover micropillar sequences for complex transformations that
are substantially shorter than manually designed sequences. We also determine
sequences for novel transformations that were difficult to manually design.
Finally, we experimentally validate these computational designs by fabricating
devices and comparing predictions with the results from confocal microscopy
Less is More: Exploiting the Standard Compiler Optimization Levels for Better Performance and Energy Consumption
This paper presents the interesting observation that by performing fewer of
the optimizations available in a standard compiler optimization level such as
-O2, while preserving their original ordering, significant savings can be
achieved in both execution time and energy consumption. This observation has
been validated on two embedded processors, namely the ARM Cortex-M0 and the ARM
Cortex-M3, using two different versions of the LLVM compilation framework; v3.8
and v5.0. Experimental evaluation with 71 embedded benchmarks demonstrated
performance gains for at least half of the benchmarks for both processors. An
average execution time reduction of 2.4% and 5.3% was achieved across all the
benchmarks for the Cortex-M0 and Cortex-M3 processors, respectively, with
execution time improvements ranging from 1% up to 90% over the -O2. The savings
that can be achieved are in the same range as what can be achieved by the
state-of-the-art compilation approaches that use iterative compilation or
machine learning to select flags or to determine phase orderings that result in
more efficient code. In contrast to these time consuming and expensive to apply
techniques, our approach only needs to test a limited number of optimization
configurations, less than 64, to obtain similar or even better savings.
Furthermore, our approach can support multi-criteria optimization as it targets
execution time, energy consumption and code size at the same time.Comment: 15 pages, 3 figures, 71 benchmarks used for evaluatio
- …