5 research outputs found
ytopt: Autotuning Scientific Applications for Energy Efficiency at Large Scales
As we enter the exascale computing era, efficiently utilizing power and
optimizing the performance of scientific applications under power and energy
constraints has become critical and challenging. We propose a low-overhead
autotuning framework to autotune performance and energy for various hybrid
MPI/OpenMP scientific applications at large scales and to explore the tradeoffs
between application runtime and power/energy for energy efficient application
execution, then use this framework to autotune four ECP proxy applications --
XSBench, AMG, SWFFT, and SW4lite. Our approach uses Bayesian optimization with
a Random Forest surrogate model to effectively search parameter spaces with up
to 6 million different configurations on two large-scale production systems,
Theta at Argonne National Laboratory and Summit at Oak Ridge National
Laboratory. The experimental results show that our autotuning framework at
large scales has low overhead and achieves good scalability. Using the proposed
autotuning framework to identify the best configurations, we achieve up to
91.59% performance improvement, up to 21.2% energy savings, and up to 37.84%
EDP improvement on up to 4,096 nodes
BaCO: A Fast and Portable Bayesian Compiler Optimization Framework
We introduce the Bayesian Compiler Optimization framework (BaCO), a general
purpose autotuner for modern compilers targeting CPUs, GPUs, and FPGAs. BaCO
provides the flexibility needed to handle the requirements of modern autotuning
tasks. Particularly, it deals with permutation, ordered, and continuous
parameter types along with both known and unknown parameter constraints. To
reason about these parameter types and efficiently deliver high-quality code,
BaCO uses Bayesian optimiza tion algorithms specialized towards the autotuning
domain. We demonstrate BaCO's effectiveness on three modern compiler systems:
TACO, RISE & ELEVATE, and HPVM2FPGA for CPUs, GPUs, and FPGAs respectively. For
these domains, BaCO outperforms current state-of-the-art autotuners by
delivering on average 1.36x-1.56x faster code with a tiny search budget, and
BaCO is able to reach expert-level performance 2.9x-3.9x faster
Application Performance Modeling via Tensor Completion
Performance tuning, software/hardware co-design, and job scheduling are among
the many tasks that rely on models to predict application performance. We
propose and evaluate low-rank tensor decomposition for modeling application
performance. We discretize the input and configuration domains of an
application using regular grids. Application execution times mapped within
grid-cells are averaged and represented by tensor elements. We show that
low-rank canonical-polyadic (CP) tensor decomposition is effective in
approximating these tensors. We further show that this decomposition enables
accurate extrapolation of unobserved regions of an application's parameter
space. We then employ tensor completion to optimize a CP decomposition given a
sparse set of observed execution times. We consider alternative
piecewise/grid-based models and supervised learning models for six applications
and demonstrate that CP decomposition optimized using tensor completion offers
higher prediction accuracy and memory-efficiency for high-dimensional
performance modeling
Hybrid Models for Mixed Variables in Bayesian Optimization
This paper presents a new type of hybrid models for Bayesian optimization
(BO) adept at managing mixed variables, encompassing both quantitative
(continuous and integer) and qualitative (categorical) types. Our proposed new
hybrid models merge Monte Carlo Tree Search structure (MCTS) for categorical
variables with Gaussian Processes (GP) for continuous ones. Addressing
efficiency in searching phase, we juxtapose the original (frequentist) upper
confidence bound tree search (UCTS) and the Bayesian Dirichlet search
strategies, showcasing the tree architecture's integration into Bayesian
optimization. Central to our innovation in surrogate modeling phase is online
kernel selection for mixed-variable BO. Our innovations, including dynamic
kernel selection, unique UCTS (hybridM) and Bayesian update strategies
(hybridD), position our hybrid models as an advancement in mixed-variable
surrogate models. Numerical experiments underscore the hybrid models'
superiority, highlighting their potential in Bayesian optimization.Comment: 32 pages, 8 Figure