112,091 research outputs found
Hyperparameter Importance Across Datasets
With the advent of automated machine learning, automated hyperparameter
optimization methods are by now routinely used in data mining. However, this
progress is not yet matched by equal progress on automatic analyses that yield
information beyond performance-optimizing hyperparameter settings. In this
work, we aim to answer the following two questions: Given an algorithm, what
are generally its most important hyperparameters, and what are typically good
values for these? We present methodology and a framework to answer these
questions based on meta-learning across many datasets. We apply this
methodology using the experimental meta-data available on OpenML to determine
the most important hyperparameters of support vector machines, random forests
and Adaboost, and to infer priors for all their hyperparameters. The results,
obtained fully automatically, provide a quantitative basis to focus efforts in
both manual algorithm design and in automated hyperparameter optimization. The
conducted experiments confirm that the hyperparameters selected by the proposed
method are indeed the most important ones and that the obtained priors also
lead to statistically significant improvements in hyperparameter optimization.Comment: \c{opyright} 2018. Copyright is held by the owner/author(s).
Publication rights licensed to ACM. This is the author's version of the work.
It is posted here for your personal use, not for redistribution. The
definitive Version of Record was published in Proceedings of the 24th ACM
SIGKDD International Conference on Knowledge Discovery & Data Minin
Hyperparameter optimization with approximate gradient
Most models in machine learning contain at least one hyperparameter to
control for model complexity. Choosing an appropriate set of hyperparameters is
both crucial in terms of model accuracy and computationally challenging. In
this work we propose an algorithm for the optimization of continuous
hyperparameters using inexact gradient information. An advantage of this method
is that hyperparameters can be updated before model parameters have fully
converged. We also give sufficient conditions for the global convergence of
this method, based on regularity conditions of the involved functions and
summability of errors. Finally, we validate the empirical performance of this
method on the estimation of regularization constants of L2-regularized logistic
regression and kernel Ridge regression. Empirical benchmarks indicate that our
approach is highly competitive with respect to state of the art methods.Comment: Proceedings of the International conference on Machine Learning
(ICML
Alpha MAML: Adaptive Model-Agnostic Meta-Learning
Model-agnostic meta-learning (MAML) is a meta-learning technique to train a
model on a multitude of learning tasks in a way that primes the model for
few-shot learning of new tasks. The MAML algorithm performs well on few-shot
learning problems in classification, regression, and fine-tuning of policy
gradients in reinforcement learning, but comes with the need for costly
hyperparameter tuning for training stability. We address this shortcoming by
introducing an extension to MAML, called Alpha MAML, to incorporate an online
hyperparameter adaptation scheme that eliminates the need to tune meta-learning
and learning rates. Our results with the Omniglot database demonstrate a
substantial reduction in the need to tune MAML training hyperparameters and
improvement to training stability with less sensitivity to hyperparameter
choice.Comment: 6th ICML Workshop on Automated Machine Learning (2019
Lipschitz Adaptivity with Multiple Learning Rates in Online Learning
We aim to design adaptive online learning algorithms that take advantage of
any special structure that might be present in the learning task at hand, with
as little manual tuning by the user as possible. A fundamental obstacle that
comes up in the design of such adaptive algorithms is to calibrate a so-called
step-size or learning rate hyperparameter depending on variance, gradient
norms, etc. A recent technique promises to overcome this difficulty by
maintaining multiple learning rates in parallel. This technique has been
applied in the MetaGrad algorithm for online convex optimization and the Squint
algorithm for prediction with expert advice. However, in both cases the user
still has to provide in advance a Lipschitz hyperparameter that bounds the norm
of the gradients. Although this hyperparameter is typically not available in
advance, tuning it correctly is crucial: if it is set too small, the methods
may fail completely; but if it is taken too large, performance deteriorates
significantly. In the present work we remove this Lipschitz hyperparameter by
designing new versions of MetaGrad and Squint that adapt to its optimal value
automatically. We achieve this by dynamically updating the set of active
learning rates. For MetaGrad, we further improve the computational efficiency
of handling constraints on the domain of prediction, and we remove the need to
specify the number of rounds in advance.Comment: 22 pages. To appear in COLT 201
- …
