Bayesian optimization has become a standard technique for hyperparameter
optimization, including data-intensive models such as deep neural networks that
may take days or weeks to train. We consider the setting where previous
optimization runs are available, and we wish to use their results to warm-start
a new optimization run. We develop an ensemble model that can incorporate the
results of past optimization runs, while avoiding the poor scaling that comes
with putting all results into a single Gaussian process model. The ensemble
combines models from past runs according to estimates of their generalization
performance on the current optimization. Results from a large collection of
hyperparameter optimization benchmark problems and from optimization of a
production computer vision platform at Facebook show that the ensemble can
substantially reduce the time it takes to obtain near-optimal configurations,
and is useful for warm-starting expensive searches or running quick
re-optimizations