23,136 research outputs found
Batch Bayesian Optimization via Local Penalization
The popularity of Bayesian optimization methods for efficient exploration of
parameter spaces has lead to a series of papers applying Gaussian processes as
surrogates in the optimization of functions. However, most proposed approaches
only allow the exploration of the parameter space to occur sequentially. Often,
it is desirable to simultaneously propose batches of parameter values to
explore. This is particularly the case when large parallel processing
facilities are available. These facilities could be computational or physical
facets of the process being optimized. E.g. in biological experiments many
experimental set ups allow several samples to be simultaneously processed.
Batch methods, however, require modeling of the interaction between the
evaluations in the batch, which can be expensive in complex scenarios. We
investigate a simple heuristic based on an estimate of the Lipschitz constant
that captures the most important aspect of this interaction (i.e. local
repulsion) at negligible computational overhead. The resulting algorithm
compares well, in running time, with much more elaborate alternatives. The
approach assumes that the function of interest, , is a Lipschitz continuous
function. A wrap-loop around the acquisition function is used to collect
batches of points of certain size minimizing the non-parallelizable
computational effort. The speed-up of our method with respect to previous
approaches is significant in a set of computationally expensive experiments.Comment: 11 pages, 10 figure
SHADHO: Massively Scalable Hardware-Aware Distributed Hyperparameter Optimization
Computer vision is experiencing an AI renaissance, in which machine learning
models are expediting important breakthroughs in academic research and
commercial applications. Effectively training these models, however, is not
trivial due in part to hyperparameters: user-configured values that control a
model's ability to learn from data. Existing hyperparameter optimization
methods are highly parallel but make no effort to balance the search across
heterogeneous hardware or to prioritize searching high-impact spaces. In this
paper, we introduce a framework for massively Scalable Hardware-Aware
Distributed Hyperparameter Optimization (SHADHO). Our framework calculates the
relative complexity of each search space and monitors performance on the
learning task over all trials. These metrics are then used as heuristics to
assign hyperparameters to distributed workers based on their hardware. We first
demonstrate that our framework achieves double the throughput of a standard
distributed hyperparameter optimization framework by optimizing SVM for MNIST
using 150 distributed workers. We then conduct model search with SHADHO over
the course of one week using 74 GPUs across two compute clusters to optimize
U-Net for a cell segmentation task, discovering 515 models that achieve a lower
validation loss than standard U-Net.Comment: 10 pages, 6 figure
- …