299 research outputs found
Deep Ranking Ensembles for Hyperparameter Optimization
Automatically optimizing the hyperparameters of Machine Learning algorithms
is one of the primary open questions in AI. Existing work in Hyperparameter
Optimization (HPO) trains surrogate models for approximating the response
surface of hyperparameters as a regression task. In contrast, we hypothesize
that the optimal strategy for training surrogates is to preserve the ranks of
the performances of hyperparameter configurations as a Learning to Rank
problem. As a result, we present a novel method that meta-learns neural network
surrogates optimized for ranking the configurations' performances while
modeling their uncertainty via ensembling. In a large-scale experimental
protocol comprising 12 baselines, 16 HPO search spaces and 86 datasets/tasks,
we demonstrate that our method achieves new state-of-the-art results in HPO.Comment: Published in ICLR 202
SHADHO: Massively Scalable Hardware-Aware Distributed Hyperparameter Optimization
Computer vision is experiencing an AI renaissance, in which machine learning
models are expediting important breakthroughs in academic research and
commercial applications. Effectively training these models, however, is not
trivial due in part to hyperparameters: user-configured values that control a
model's ability to learn from data. Existing hyperparameter optimization
methods are highly parallel but make no effort to balance the search across
heterogeneous hardware or to prioritize searching high-impact spaces. In this
paper, we introduce a framework for massively Scalable Hardware-Aware
Distributed Hyperparameter Optimization (SHADHO). Our framework calculates the
relative complexity of each search space and monitors performance on the
learning task over all trials. These metrics are then used as heuristics to
assign hyperparameters to distributed workers based on their hardware. We first
demonstrate that our framework achieves double the throughput of a standard
distributed hyperparameter optimization framework by optimizing SVM for MNIST
using 150 distributed workers. We then conduct model search with SHADHO over
the course of one week using 74 GPUs across two compute clusters to optimize
U-Net for a cell segmentation task, discovering 515 models that achieve a lower
validation loss than standard U-Net.Comment: 10 pages, 6 figure
MFES-HB: Efficient Hyperband with Multi-Fidelity Quality Measurements
Hyperparameter optimization (HPO) is a fundamental problem in automatic
machine learning (AutoML). However, due to the expensive evaluation cost of
models (e.g., training deep learning models or training models on large
datasets), vanilla Bayesian optimization (BO) is typically computationally
infeasible. To alleviate this issue, Hyperband (HB) utilizes the early stopping
mechanism to speed up configuration evaluations by terminating those
badly-performing configurations in advance. This leads to two kinds of quality
measurements: (1) many low-fidelity measurements for configurations that get
early-stopped, and (2) few high-fidelity measurements for configurations that
are evaluated without being early stopped. The state-of-the-art HB-style
method, BOHB, aims to combine the benefits of both BO and HB. Instead of
sampling configurations randomly in HB, BOHB samples configurations based on a
BO surrogate model, which is constructed with the high-fidelity measurements
only. However, the scarcity of high-fidelity measurements greatly hampers the
efficiency of BO to guide the configuration search. In this paper, we present
MFES-HB, an efficient Hyperband method that is capable of utilizing both the
high-fidelity and low-fidelity measurements to accelerate the convergence of
HPO tasks. Designing MFES-HB is not trivial as the low-fidelity measurements
can be biased yet informative to guide the configuration search. Thus we
propose to build a Multi- Fidelity Ensemble Surrogate (MFES) based on the
generalized Product of Experts framework, which can integrate useful
information from multi-fidelity measurements effectively. The empirical studies
on the real-world AutoML tasks demonstrate that MFES-HB can achieve 3.3-8.9x
speedups over the state-of-the-art approach - BOHB
Hyperparameter Learning via Distributional Transfer
Bayesian optimisation is a popular technique for hyperparameter learning but
typically requires initial exploration even in cases where similar prior tasks
have been solved. We propose to transfer information across tasks using learnt
representations of training datasets used in those tasks. This results in a
joint Gaussian process model on hyperparameters and data representations.
Representations make use of the framework of distribution embeddings into
reproducing kernel Hilbert spaces. The developed method has a faster
convergence compared to existing baselines, in some cases requiring only a few
evaluations of the target objective
- …