749 research outputs found
ytopt: Autotuning Scientific Applications for Energy Efficiency at Large Scales
As we enter the exascale computing era, efficiently utilizing power and
optimizing the performance of scientific applications under power and energy
constraints has become critical and challenging. We propose a low-overhead
autotuning framework to autotune performance and energy for various hybrid
MPI/OpenMP scientific applications at large scales and to explore the tradeoffs
between application runtime and power/energy for energy efficient application
execution, then use this framework to autotune four ECP proxy applications --
XSBench, AMG, SWFFT, and SW4lite. Our approach uses Bayesian optimization with
a Random Forest surrogate model to effectively search parameter spaces with up
to 6 million different configurations on two large-scale production systems,
Theta at Argonne National Laboratory and Summit at Oak Ridge National
Laboratory. The experimental results show that our autotuning framework at
large scales has low overhead and achieves good scalability. Using the proposed
autotuning framework to identify the best configurations, we achieve up to
91.59% performance improvement, up to 21.2% energy savings, and up to 37.84%
EDP improvement on up to 4,096 nodes
A Survey on Compiler Autotuning using Machine Learning
Since the mid-1990s, researchers have been trying to use machine-learning
based approaches to solve a number of different compiler optimization problems.
These techniques primarily enhance the quality of the obtained results and,
more importantly, make it feasible to tackle two main compiler optimization
problems: optimization selection (choosing which optimizations to apply) and
phase-ordering (choosing the order of applying optimizations). The compiler
optimization space continues to grow due to the advancement of applications,
increasing number of compiler optimizations, and new target architectures.
Generic optimization passes in compilers cannot fully leverage newly introduced
optimizations and, therefore, cannot keep up with the pace of increasing
options. This survey summarizes and classifies the recent advances in using
machine learning for the compiler optimization field, particularly on the two
major problems of (1) selecting the best optimizations and (2) the
phase-ordering of optimizations. The survey highlights the approaches taken so
far, the obtained results, the fine-grain classification among different
approaches and finally, the influential papers of the field.Comment: version 5.0 (updated on September 2018)- Preprint Version For our
Accepted Journal @ ACM CSUR 2018 (42 pages) - This survey will be updated
quarterly here (Send me your new published papers to be added in the
subsequent version) History: Received November 2016; Revised August 2017;
Revised February 2018; Accepted March 2018
BaCO: A Fast and Portable Bayesian Compiler Optimization Framework
We introduce the Bayesian Compiler Optimization framework (BaCO), a general
purpose autotuner for modern compilers targeting CPUs, GPUs, and FPGAs. BaCO
provides the flexibility needed to handle the requirements of modern autotuning
tasks. Particularly, it deals with permutation, ordered, and continuous
parameter types along with both known and unknown parameter constraints. To
reason about these parameter types and efficiently deliver high-quality code,
BaCO uses Bayesian optimiza tion algorithms specialized towards the autotuning
domain. We demonstrate BaCO's effectiveness on three modern compiler systems:
TACO, RISE & ELEVATE, and HPVM2FPGA for CPUs, GPUs, and FPGAs respectively. For
these domains, BaCO outperforms current state-of-the-art autotuners by
delivering on average 1.36x-1.56x faster code with a tiny search budget, and
BaCO is able to reach expert-level performance 2.9x-3.9x faster
Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges
Most machine learning algorithms are configured by a set of hyperparameters whose values must be carefully chosen and which often considerably impact performance. To avoid a time-consuming and irreproducible manual process of trial-and-error to find well-performing hyperparameter configurations, various automatic hyperparameter optimization (HPO) methods—for example, based on resampling error estimation for supervised machine learning—can be employed. After introducing HPO from a general perspective, this paper reviews important HPO methods, from simple techniques such as grid or random search to more advanced methods like evolution strategies, Bayesian optimization, Hyperband, and racing. This work gives practical recommendations regarding important choices to be made when conducting HPO, including the HPO algorithms themselves, performance evaluation, how to combine HPO with machine learning pipelines, runtime improvements, and parallelization. This article is categorized under: Algorithmic Development > Statistics Technologies > Machine Learning Technologies > Prediction
Recommended from our members
TunIO: An AI-powered Framework for Optimizing HPC I/O
I/O operations are a known performance bottleneck of HPC applications. To achieve good performance, users often employ an iterative multistage tuning process to find an optimal I/O stack configuration. However, an I/O stack contains multiple layers, such as high-level I/O libraries, I/O middleware, and parallel file systems, and each layer has many parameters. These parameters and layers are entangled and influenced by each other. The tuning process is time-consuming and complex. In this work, we present TunIO, an AI-powered I/O tuning framework that implements several techniques to balance the tuning cost and performance gain, including tuning the high-impact parameters first. Furthermore, TunIO analyzes the application source code to extract its I/O kernel while retaining all statements necessary to perform I/O. It utilizes a smart selection of high-impact configuration parameters of the given tuning objective. Finally, it uses a novel Reinforcement Learning (RL)-driven early stopping mechanism to balance the cost and performance gain. Experimental results show that TunIO leads to a reduction of up to ≈73% in tuning time while achieving the same performance gain when compared to H5Tuner. It achieves a significant performance gain/cost of 208.4 MBps/min (I/O bandwidth for each minute spent in tuning) over existing approaches under our testing
Benchmarking optimization algorithms for auto-tuning GPU kernels
Recent years have witnessed phenomenal growth in the application, and
capabilities of Graphical Processing Units (GPUs) due to their high parallel
computation power at relatively low cost. However, writing a computationally
efficient GPU program (kernel) is challenging, and generally only certain
specific kernel configurations lead to significant increases in performance.
Auto-tuning is the process of automatically optimizing software for
highly-efficient execution on a target hardware platform. Auto-tuning is
particularly useful for GPU programming, as a single kernel requires re-tuning
after code changes, for different input data, and for different architectures.
However, the discrete, and non-convex nature of the search space creates a
challenging optimization problem. In this work, we investigate which algorithm
produces the fastest kernels if the time-budget for the tuning task is varied.
We conduct a survey by performing experiments on 26 different kernel spaces,
from 9 different GPUs, for 16 different evolutionary black-box optimization
algorithms. We then analyze these results and introduce a novel metric based on
the PageRank centrality concept as a tool for gaining insight into the
difficulty of the optimization problem. We demonstrate that our metric
correlates strongly with observed tuning performance.Comment: in IEEE Transactions on Evolutionary Computation, 202
Integrating Bayesian Optimization and Machine Learning for the Optimal Configuration of Cloud Systems
Bayesian Optimization (BO) is an efficient method for finding optimal cloud configurations for several types of applications. On the other hand, Machine Learning (ML) can provide helpful knowledge about the application at hand thanks to its predicting capabilities. This work proposes a general approach based on BO, which integrates elements from ML techniques in multiple ways, to find an optimal configuration of recurring jobs running in public and private cloud environments, possibly subject to blackbox constraints, e.g., application execution time or accuracy. We test our approach by considering several use cases, including edge computing, scientific computing, and Big Data applications. Results show that our solution outperforms other state-of-the-art black-box techniques, including classical autotuning and BO- and ML-based algorithms, reducing the number of unfeasible executions and corresponding costs up to 2–4 times
- …