Search CORE

56,579 research outputs found

Practical Bayesian Optimization of Machine Learning Algorithms

Author: Adams Ryan P.
Larochelle Hugo
Snoek Jasper
Publication venue
Publication date: 01/01/2012
Field of study

Machine learning algorithms frequently require careful tuning of model hyperparameters, regularization terms, and optimization parameters. Unfortunately, this tuning is often a "black art" that requires expert experience, unwritten rules of thumb, or sometimes brute-force search. Much more appealing is the idea of developing automatic approaches which can optimize the performance of a given learning algorithm to the task at hand. In this work, we consider the automatic tuning problem within the framework of Bayesian optimization, in which a learning algorithm's generalization performance is modeled as a sample from a Gaussian process (GP). The tractable posterior distribution induced by the GP leads to efficient use of the information gathered by previous experiments, enabling optimal choices about what parameters to try next. Here we show how the effects of the Gaussian process prior and the associated inference procedure can have a large impact on the success or failure of Bayesian optimization. We show that thoughtful choices can lead to results that exceed expert-level performance in tuning machine learning algorithms. We also describe new algorithms that take into account the variable cost (duration) of learning experiments and that can leverage the presence of multiple cores for parallel experimentation. We show that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expert-level optimization on a diverse set of contemporary algorithms including latent Dirichlet allocation, structured SVMs and convolutional neural networks

arXiv.org e-Print Archive

CiteSeerX

Recommended from our members

Practical Bayesian Optimization of Machine Learning Algorithms

Author: Adams Ryan Prescott
Larochelle Hugo
Snoek Jasper
Publication venue: Curran Associates, Inc.
Publication date: 13/02/2014
Field of study

Harvard University - DASH

Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges

Author: Becker Marc
Binder Martin
Bischl Bernd
Boulesteix Anne‐Laure
Coors Stefan
Deng Difan
Lang Michel
Lindauer Marius
Pielok Tobias
Richter Jakob
Thomas Janek
Ullmann Theresa
Publication venue: Hoboken, NJ : Wiley
Publication date: 01/01/2023
Field of study

Most machine learning algorithms are configured by a set of hyperparameters whose values must be carefully chosen and which often considerably impact performance. To avoid a time-consuming and irreproducible manual process of trial-and-error to find well-performing hyperparameter configurations, various automatic hyperparameter optimization (HPO) methods—for example, based on resampling error estimation for supervised machine learning—can be employed. After introducing HPO from a general perspective, this paper reviews important HPO methods, from simple techniques such as grid or random search to more advanced methods like evolution strategies, Bayesian optimization, Hyperband, and racing. This work gives practical recommendations regarding important choices to be made when conducting HPO, including the HPO algorithms themselves, performance evaluation, how to combine HPO with machine learning pipelines, runtime improvements, and parallelization. This article is categorized under: Algorithmic Development > Statistics Technologies > Machine Learning Technologies > Prediction

Institutionelles Repositorium der Leibniz Universität Hannover

Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets

Author: Bartels Simon
Falkner Stefan
Hennig Philipp
Hutter Frank
Klein Aaron
Publication venue
Publication date: 01/01/2017
Field of study

Bayesian optimization has become a successful tool for hyperparameter optimization of machine learning algorithms, such as support vector machines or deep neural networks. Despite its success, for large datasets, training and validating a single configuration often takes hours, days, or even weeks, which limits the achievable performance. To accelerate hyperparameter optimization, we propose a generative model for the validation error as a function of training set size, which is learned during the optimization process and allows exploration of preliminary configurations on small subsets, by extrapolating to the full dataset. We construct a Bayesian optimization procedure, dubbed Fabolas, which models loss and training time as a function of dataset size and automatically trades off high information gain about the global optimum against computational cost. Experiments optimizing support vector machines and deep neural networks show that Fabolas often finds high-quality solutions 10 to 100 times faster than other state-of-the-art Bayesian optimization methods or the recently proposed bandit strategy Hyperband

arXiv.org e-Print Archive

MPG.PuRe

Learning Multiple Defaults for Machine Learning Algorithms

Author: Bischl Bernd
Müller Andreas
Pfisterer Florian
Probst Philipp
van Rijn Jan N.
Publication venue
Publication date: 01/01/2021
Field of study

The performance of modern machine learning methods highly depends on their hyperparameter configurations. One simple way of selecting a configuration is to use default settings, often proposed along with the publication and implementation of a new algorithm. Those default values are usually chosen in an ad-hoc manner to work good enough on a wide variety of datasets. To address this problem, different automatic hyperparameter configuration algorithms have been proposed, which select an optimal configuration per dataset. This principled approach usually improves performance, but adds additional algorithmic complexity and computational costs to the training procedure. As an alternative to this, we propose learning a set of complementary default values from a large database of prior empirical results. Selecting an appropriate configuration on a new dataset then requires only a simple, efficient and embarrassingly parallel search over this set. We demonstrate the effectiveness and efficiency of the approach we propose in comparison to random search and Bayesian Optimization

arXiv.org e-Print Archive

Leiden University Scholary Publications