64,877 research outputs found
Alpha MAML: Adaptive Model-Agnostic Meta-Learning
Model-agnostic meta-learning (MAML) is a meta-learning technique to train a
model on a multitude of learning tasks in a way that primes the model for
few-shot learning of new tasks. The MAML algorithm performs well on few-shot
learning problems in classification, regression, and fine-tuning of policy
gradients in reinforcement learning, but comes with the need for costly
hyperparameter tuning for training stability. We address this shortcoming by
introducing an extension to MAML, called Alpha MAML, to incorporate an online
hyperparameter adaptation scheme that eliminates the need to tune meta-learning
and learning rates. Our results with the Omniglot database demonstrate a
substantial reduction in the need to tune MAML training hyperparameters and
improvement to training stability with less sensitivity to hyperparameter
choice.Comment: 6th ICML Workshop on Automated Machine Learning (2019
Collaborative hyperparameter tuning
International audienceHyperparameter learning has traditionally been a manual task because of the limited number of trials. Today's computing infrastructures allow bigger evaluation budgets, thus opening the way for algorithmic approaches. Recently, surrogate-based optimization was successfully applied to hyperparameter learning for deep belief networks and to WEKA classifiers. The methods combined brute force computational power with model building about the behavior of the error function in the hyperparameter space, and they could significantly improve on manual hyperparameter tuning. What may make experienced practitioners even better at hyperparameter optimization is their ability to generalize across similar learning problems. In this paper, we propose a generic method to incorporate knowledge from previous experiments when simultaneously tuning a learning algorithm on new problems at hand. To this end, we combine surrogate-based ranking and optimization techniques for surrogate-based collaborative tuning (SCoT). We demonstrate SCoT in two experiments where it outperforms standard tuning techniques and single-problem surrogate-based optimization
Lipschitz Adaptivity with Multiple Learning Rates in Online Learning
We aim to design adaptive online learning algorithms that take advantage of
any special structure that might be present in the learning task at hand, with
as little manual tuning by the user as possible. A fundamental obstacle that
comes up in the design of such adaptive algorithms is to calibrate a so-called
step-size or learning rate hyperparameter depending on variance, gradient
norms, etc. A recent technique promises to overcome this difficulty by
maintaining multiple learning rates in parallel. This technique has been
applied in the MetaGrad algorithm for online convex optimization and the Squint
algorithm for prediction with expert advice. However, in both cases the user
still has to provide in advance a Lipschitz hyperparameter that bounds the norm
of the gradients. Although this hyperparameter is typically not available in
advance, tuning it correctly is crucial: if it is set too small, the methods
may fail completely; but if it is taken too large, performance deteriorates
significantly. In the present work we remove this Lipschitz hyperparameter by
designing new versions of MetaGrad and Squint that adapt to its optimal value
automatically. We achieve this by dynamically updating the set of active
learning rates. For MetaGrad, we further improve the computational efficiency
of handling constraints on the domain of prediction, and we remove the need to
specify the number of rounds in advance.Comment: 22 pages. To appear in COLT 201
A Comparative Study on Regularization Strategies for Embedding-based Neural Networks
This paper aims to compare different regularization strategies to address a
common phenomenon, severe overfitting, in embedding-based neural networks for
NLP. We chose two widely studied neural models and tasks as our testbed. We
tried several frequently applied or newly proposed regularization strategies,
including penalizing weights (embeddings excluded), penalizing embeddings,
re-embedding words, and dropout. We also emphasized on incremental
hyperparameter tuning, and combining different regularizations. The results
provide a picture on tuning hyperparameters for neural NLP models.Comment: EMNLP '1
- …
