36,192 research outputs found
A Survey on Compiler Autotuning using Machine Learning
Since the mid-1990s, researchers have been trying to use machine-learning
based approaches to solve a number of different compiler optimization problems.
These techniques primarily enhance the quality of the obtained results and,
more importantly, make it feasible to tackle two main compiler optimization
problems: optimization selection (choosing which optimizations to apply) and
phase-ordering (choosing the order of applying optimizations). The compiler
optimization space continues to grow due to the advancement of applications,
increasing number of compiler optimizations, and new target architectures.
Generic optimization passes in compilers cannot fully leverage newly introduced
optimizations and, therefore, cannot keep up with the pace of increasing
options. This survey summarizes and classifies the recent advances in using
machine learning for the compiler optimization field, particularly on the two
major problems of (1) selecting the best optimizations and (2) the
phase-ordering of optimizations. The survey highlights the approaches taken so
far, the obtained results, the fine-grain classification among different
approaches and finally, the influential papers of the field.Comment: version 5.0 (updated on September 2018)- Preprint Version For our
Accepted Journal @ ACM CSUR 2018 (42 pages) - This survey will be updated
quarterly here (Send me your new published papers to be added in the
subsequent version) History: Received November 2016; Revised August 2017;
Revised February 2018; Accepted March 2018
AMaĻoSāAbstract Machine for Xcerpt
Web query languages promise convenient and efficient access
to Web data such as XML, RDF, or Topic Maps. Xcerpt is one such Web
query language with strong emphasis on novel high-level constructs for
effective and convenient query authoring, particularly tailored to versatile
access to data in different Web formats such as XML or RDF.
However, so far it lacks an efficient implementation to supplement the
convenient language features. AMaĻoS is an abstract machine implementation
for Xcerpt that aims at efficiency and ease of deployment. It
strictly separates compilation and execution of queries: Queries are compiled
once to abstract machine code that consists in (1) a code segment
with instructions for evaluating each rule and (2) a hint segment that
provides the abstract machine with optimization hints derived by the
query compilation. This article summarizes the motivation and principles
behind AMaĻoS and discusses how its current architecture realizes
these principles
Towards an Achievable Performance for the Loop Nests
Numerous code optimization techniques, including loop nest optimizations,
have been developed over the last four decades. Loop optimization techniques
transform loop nests to improve the performance of the code on a target
architecture, including exposing parallelism. Finding and evaluating an
optimal, semantic-preserving sequence of transformations is a complex problem.
The sequence is guided using heuristics and/or analytical models and there is
no way of knowing how close it gets to optimal performance or if there is any
headroom for improvement. This paper makes two contributions. First, it uses a
comparative analysis of loop optimizations/transformations across multiple
compilers to determine how much headroom may exist for each compiler. And
second, it presents an approach to characterize the loop nests based on their
hardware performance counter values and a Machine Learning approach that
predicts which compiler will generate the fastest code for a loop nest. The
prediction is made for both auto-vectorized, serial compilation and for
auto-parallelization. The results show that the headroom for state-of-the-art
compilers ranges from 1.10x to 1.42x for the serial code and from 1.30x to
1.71x for the auto-parallelized code. These results are based on the Machine
Learning predictions.Comment: Accepted at the 31st International Workshop on Languages and
Compilers for Parallel Computing (LCPC 2018
Emulating and evaluating hybrid memory for managed languages on NUMA hardware
Non-volatile memory (NVM) has the potential to become a mainstream memory technology and challenge DRAM. Researchers evaluating the speed, endurance, and abstractions of hybrid memories with DRAM and NVM typically use simulation, making it easy to evaluate the impact of different hardware technologies and parameters. Simulation is, however, extremely slow, limiting the applications and datasets in the evaluation. Simulation also precludes critical workloads, especially those written in managed languages such as Java and C#. Good methodology embraces a variety of techniques for evaluating new ideas, expanding the experimental scope, and uncovering new insights.
This paper introduces a platform to emulate hybrid memory for managed languages using commodity NUMA servers. Emulation complements simulation but offers richer software experimentation. We use a thread-local socket to emulate DRAM and a remote socket to emulate NVM. We use standard C library routines to allocate heap memory on the DRAM and NVM sockets for use with explicit memory management or garbage collection. We evaluate the emulator using various configurations of write-rationing garbage collectors that improve NVM lifetimes by limiting writes to NVM, using 15 applications and various datasets and workload configurations. We show emulation and simulation confirm each other's trends in terms of writes to NVM for different software configurations, increasing our confidence in predicting future system effects. Emulation brings novel insights, such as the non-linear effects of multi-programmed workloads on NVM writes, and that Java applications write significantly more than their C++ equivalents. We make our software infrastructure publicly available to advance the evaluation of novel memory management schemes on hybrid memories
- ā¦