12,508 research outputs found
Batch Bayesian Optimization via Local Penalization
The popularity of Bayesian optimization methods for efficient exploration of
parameter spaces has lead to a series of papers applying Gaussian processes as
surrogates in the optimization of functions. However, most proposed approaches
only allow the exploration of the parameter space to occur sequentially. Often,
it is desirable to simultaneously propose batches of parameter values to
explore. This is particularly the case when large parallel processing
facilities are available. These facilities could be computational or physical
facets of the process being optimized. E.g. in biological experiments many
experimental set ups allow several samples to be simultaneously processed.
Batch methods, however, require modeling of the interaction between the
evaluations in the batch, which can be expensive in complex scenarios. We
investigate a simple heuristic based on an estimate of the Lipschitz constant
that captures the most important aspect of this interaction (i.e. local
repulsion) at negligible computational overhead. The resulting algorithm
compares well, in running time, with much more elaborate alternatives. The
approach assumes that the function of interest, , is a Lipschitz continuous
function. A wrap-loop around the acquisition function is used to collect
batches of points of certain size minimizing the non-parallelizable
computational effort. The speed-up of our method with respect to previous
approaches is significant in a set of computationally expensive experiments.Comment: 11 pages, 10 figure
Practical Bayesian Optimization of Machine Learning Algorithms
Machine learning algorithms frequently require careful tuning of model
hyperparameters, regularization terms, and optimization parameters.
Unfortunately, this tuning is often a "black art" that requires expert
experience, unwritten rules of thumb, or sometimes brute-force search. Much
more appealing is the idea of developing automatic approaches which can
optimize the performance of a given learning algorithm to the task at hand. In
this work, we consider the automatic tuning problem within the framework of
Bayesian optimization, in which a learning algorithm's generalization
performance is modeled as a sample from a Gaussian process (GP). The tractable
posterior distribution induced by the GP leads to efficient use of the
information gathered by previous experiments, enabling optimal choices about
what parameters to try next. Here we show how the effects of the Gaussian
process prior and the associated inference procedure can have a large impact on
the success or failure of Bayesian optimization. We show that thoughtful
choices can lead to results that exceed expert-level performance in tuning
machine learning algorithms. We also describe new algorithms that take into
account the variable cost (duration) of learning experiments and that can
leverage the presence of multiple cores for parallel experimentation. We show
that these proposed algorithms improve on previous automatic procedures and can
reach or surpass human expert-level optimization on a diverse set of
contemporary algorithms including latent Dirichlet allocation, structured SVMs
and convolutional neural networks
- …