22 research outputs found
Neural Networks for Predicting Algorithm Runtime Distributions
Many state-of-the-art algorithms for solving hard combinatorial problems in
artificial intelligence (AI) include elements of stochasticity that lead to
high variations in runtime, even for a fixed problem instance. Knowledge about
the resulting runtime distributions (RTDs) of algorithms on given problem
instances can be exploited in various meta-algorithmic procedures, such as
algorithm selection, portfolios, and randomized restarts. Previous work has
shown that machine learning can be used to individually predict mean, median
and variance of RTDs. To establish a new state-of-the-art in predicting RTDs,
we demonstrate that the parameters of an RTD should be learned jointly and that
neural networks can do this well by directly optimizing the likelihood of an
RTD given runtime observations. In an empirical study involving five algorithms
for SAT solving and AI planning, we show that neural networks predict the true
RTDs of unseen instances better than previous methods, and can even do so when
only few runtime observations are available per training instance
Efficient Benchmarking of Algorithm Configuration Procedures via Model-Based Surrogates
The optimization of algorithm (hyper-)parameters is crucial for achieving
peak performance across a wide range of domains, ranging from deep neural
networks to solvers for hard combinatorial problems. The resulting algorithm
configuration (AC) problem has attracted much attention from the machine
learning community. However, the proper evaluation of new AC procedures is
hindered by two key hurdles. First, AC benchmarks are hard to set up. Second
and even more significantly, they are computationally expensive: a single run
of an AC procedure involves many costly runs of the target algorithm whose
performance is to be optimized in a given AC benchmark scenario. One common
workaround is to optimize cheap-to-evaluate artificial benchmark functions
(e.g., Branin) instead of actual algorithms; however, these have different
properties than realistic AC problems. Here, we propose an alternative
benchmarking approach that is similarly cheap to evaluate but much closer to
the original AC problem: replacing expensive benchmarks by surrogate benchmarks
constructed from AC benchmarks. These surrogate benchmarks approximate the
response surface corresponding to true target algorithm performance using a
regression model, and the original and surrogate benchmark share the same
(hyper-)parameter space. In our experiments, we construct and evaluate
surrogate benchmarks for hyperparameter optimization as well as for AC problems
that involve performance optimization of solvers for hard combinatorial
problems, drawing training data from the runs of existing AC procedures. We
show that our surrogate benchmarks capture overall important characteristics of
the AC scenarios, such as high- and low-performing regions, from which they
were derived, while being much easier to use and orders of magnitude cheaper to
evaluate
Auto-Sklearn 2.0: The Next Generation
Automated Machine Learning, which supports practitioners and researchers with
the tedious task of manually designing machine learning pipelines, has recently
achieved substantial success. In this paper we introduce new Automated Machine
Learning (AutoML) techniques motivated by our winning submission to the second
ChaLearn AutoML challenge, PoSH Auto-sklearn. For this, we extend Auto-sklearn
with a new, simpler meta-learning technique, improve its way of handling
iterative algorithms and enhance it with a successful bandit strategy for
budget allocation. Furthermore, we go one step further and study the design
space of AutoML itself and propose a solution towards truly hand-free AutoML.
Together, these changes give rise to the next generation of our AutoML system,
Auto-sklearn (2.0). We verify the improvement by these additions in a large
experimental study on 39 AutoML benchmark datasets and conclude the paper by
comparing to Auto-sklearn (1.0), reducing the regret by up to a factor of five
Auto-Sklearn 2.0: Hands-free AutoML via Meta-Learning
Automated Machine Learning (AutoML) supports practitioners and researchers with the tedious task of designing machine learning pipelines and has recently achieved substantial success. In this paper, we introduce new AutoML approaches motivated by our winning submission to the second ChaLearn AutoML challenge. We develop PoSH Auto-sklearn, which enables AutoML systems to work well on large datasets under rigid time limits by using a new, simple and meta-feature-free meta-learning technique and by employing a successful bandit strategy for budget allocation. However, PoSH Auto-sklearn introduces even more ways of running AutoML and might make it harder for users to set it up correctly. Therefore, we also go one step further and study the design space of AutoML itself, proposing a solution towards truly hands-free AutoML. Together, these changes give rise to the next generation of our AutoML system, Auto-sklearn 2.0. We verify the improvements by these additions in an extensive experimental study on 39 AutoML benchmark datasets. We conclude the paper by comparing to other popular AutoML frameworks and Auto-sklearn 1.0, reducing the relative error by up to a factor of 4.5, and yielding a performance in 10 minutes that is substantially better than what Auto-sklearn 1.0 achieves within an hour
Mind the Gap: Measuring Generalization Performance Across Multiple Objectives
Modern machine learning models are often constructed taking into account
multiple objectives, e.g., minimizing inference time while also maximizing
accuracy. Multi-objective hyperparameter optimization (MHPO) algorithms return
such candidate models, and the approximation of the Pareto front is used to
assess their performance. In practice, we also want to measure generalization
when moving from the validation to the test set. However, some of the models
might no longer be Pareto-optimal which makes it unclear how to quantify the
performance of the MHPO method when evaluated on the test set. To resolve this,
we provide a novel evaluation protocol that allows measuring the generalization
performance of MHPO methods and studying its capabilities for comparing two
optimization experiments