35,025 research outputs found
Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets
Bayesian optimization has become a successful tool for hyperparameter
optimization of machine learning algorithms, such as support vector machines or
deep neural networks. Despite its success, for large datasets, training and
validating a single configuration often takes hours, days, or even weeks, which
limits the achievable performance. To accelerate hyperparameter optimization,
we propose a generative model for the validation error as a function of
training set size, which is learned during the optimization process and allows
exploration of preliminary configurations on small subsets, by extrapolating to
the full dataset. We construct a Bayesian optimization procedure, dubbed
Fabolas, which models loss and training time as a function of dataset size and
automatically trades off high information gain about the global optimum against
computational cost. Experiments optimizing support vector machines and deep
neural networks show that Fabolas often finds high-quality solutions 10 to 100
times faster than other state-of-the-art Bayesian optimization methods or the
recently proposed bandit strategy Hyperband
Practical Bayesian Optimization of Machine Learning Algorithms
Machine learning algorithms frequently require careful tuning of model
hyperparameters, regularization terms, and optimization parameters.
Unfortunately, this tuning is often a "black art" that requires expert
experience, unwritten rules of thumb, or sometimes brute-force search. Much
more appealing is the idea of developing automatic approaches which can
optimize the performance of a given learning algorithm to the task at hand. In
this work, we consider the automatic tuning problem within the framework of
Bayesian optimization, in which a learning algorithm's generalization
performance is modeled as a sample from a Gaussian process (GP). The tractable
posterior distribution induced by the GP leads to efficient use of the
information gathered by previous experiments, enabling optimal choices about
what parameters to try next. Here we show how the effects of the Gaussian
process prior and the associated inference procedure can have a large impact on
the success or failure of Bayesian optimization. We show that thoughtful
choices can lead to results that exceed expert-level performance in tuning
machine learning algorithms. We also describe new algorithms that take into
account the variable cost (duration) of learning experiments and that can
leverage the presence of multiple cores for parallel experimentation. We show
that these proposed algorithms improve on previous automatic procedures and can
reach or surpass human expert-level optimization on a diverse set of
contemporary algorithms including latent Dirichlet allocation, structured SVMs
and convolutional neural networks
BOCK : Bayesian Optimization with Cylindrical Kernels
A major challenge in Bayesian Optimization is the boundary issue (Swersky,
2017) where an algorithm spends too many evaluations near the boundary of its
search space. In this paper, we propose BOCK, Bayesian Optimization with
Cylindrical Kernels, whose basic idea is to transform the ball geometry of the
search space using a cylindrical transformation. Because of the transformed
geometry, the Gaussian Process-based surrogate model spends less budget
searching near the boundary, while concentrating its efforts relatively more
near the center of the search region, where we expect the solution to be
located. We evaluate BOCK extensively, showing that it is not only more
accurate and efficient, but it also scales successfully to problems with a
dimensionality as high as 500. We show that the better accuracy and scalability
of BOCK even allows optimizing modestly sized neural network layers, as well as
neural network hyperparameters.Comment: 10 pages, 5 figures, 5 tables, 1 algorith
- …