56,138 research outputs found
A Survey on Compiler Autotuning using Machine Learning
Since the mid-1990s, researchers have been trying to use machine-learning
based approaches to solve a number of different compiler optimization problems.
These techniques primarily enhance the quality of the obtained results and,
more importantly, make it feasible to tackle two main compiler optimization
problems: optimization selection (choosing which optimizations to apply) and
phase-ordering (choosing the order of applying optimizations). The compiler
optimization space continues to grow due to the advancement of applications,
increasing number of compiler optimizations, and new target architectures.
Generic optimization passes in compilers cannot fully leverage newly introduced
optimizations and, therefore, cannot keep up with the pace of increasing
options. This survey summarizes and classifies the recent advances in using
machine learning for the compiler optimization field, particularly on the two
major problems of (1) selecting the best optimizations and (2) the
phase-ordering of optimizations. The survey highlights the approaches taken so
far, the obtained results, the fine-grain classification among different
approaches and finally, the influential papers of the field.Comment: version 5.0 (updated on September 2018)- Preprint Version For our
Accepted Journal @ ACM CSUR 2018 (42 pages) - This survey will be updated
quarterly here (Send me your new published papers to be added in the
subsequent version) History: Received November 2016; Revised August 2017;
Revised February 2018; Accepted March 2018
Open-architecture Implementation of Fragment Molecular Orbital Method for Peta-scale Computing
We present our perspective and goals on highperformance computing for
nanoscience in accordance with the global trend toward "peta-scale computing."
After reviewing our results obtained through the grid-enabled version of the
fragment molecular orbital method (FMO) on the grid testbed by the Japanese
Grid Project, National Research Grid Initiative (NAREGI), we show that FMO is
one of the best candidates for peta-scale applications by predicting its
effective performance in peta-scale computers. Finally, we introduce our new
project constructing a peta-scale application in an open-architecture
implementation of FMO in order to realize both goals of highperformance in
peta-scale computers and extendibility to multiphysics simulations.Comment: 6 pages, 9 figures, proceedings of the 2nd IEEE/ACM international
workshop on high performance computing for nano-science and technology
(HPCNano06
Mira: A Framework for Static Performance Analysis
The performance model of an application can pro- vide understanding about its
runtime behavior on particular hardware. Such information can be analyzed by
developers for performance tuning. However, model building and analyzing is
frequently ignored during software development until perfor- mance problems
arise because they require significant expertise and can involve many
time-consuming application runs. In this paper, we propose a fast, accurate,
flexible and user-friendly tool, Mira, for generating performance models by
applying static program analysis, targeting scientific applications running on
supercomputers. We parse both the source code and binary to estimate
performance attributes with better accuracy than considering just source or
just binary code. Because our analysis is static, the target program does not
need to be executed on the target architecture, which enables users to perform
analysis on available machines instead of conducting expensive exper- iments on
potentially expensive resources. Moreover, statically generated models enable
performance prediction on non-existent or unavailable architectures. In
addition to flexibility, because model generation time is significantly reduced
compared to dynamic analysis approaches, our method is suitable for rapid
application performance analysis and improvement. We present several scientific
application validation results to demonstrate the current capabilities of our
approach on small benchmarks and a mini application
Costing JIT Traces
Tracing JIT compilation generates units of compilation that
are easy to analyse and are known to execute frequently. The AJITPar
project aims to investigate whether the information in JIT traces can be
used to make better scheduling decisions or perform code transformations
to adapt the code for a specific parallel architecture. To achieve this goal,
a cost model must be developed to estimate the execution time of an
individual trace.
This paper presents the design and implementation of a system for extracting
JIT trace information from the Pycket JIT compiler. We define
three increasingly parametric cost models for Pycket traces. We perform
a search of the cost model parameter space using genetic algorithms to
identify the best weightings for those parameters. We test the accuracy
of these cost models for predicting the cost of individual traces on a set
of loop-based micro-benchmarks. We also compare the accuracy of the
cost models for predicting whole program execution time over the Pycket
benchmark suite. Our results show that the weighted cost model
using the weightings found from the genetic algorithm search has the
best accuracy
- …