4,843 research outputs found
Meta-descent for Online, Continual Prediction
This paper investigates different vector step-size adaptation approaches for
non-stationary online, continual prediction problems. Vanilla stochastic
gradient descent can be considerably improved by scaling the update with a
vector of appropriately chosen step-sizes. Many methods, including AdaGrad,
RMSProp, and AMSGrad, keep statistics about the learning process to approximate
a second order update---a vector approximation of the inverse Hessian. Another
family of approaches use meta-gradient descent to adapt the step-size
parameters to minimize prediction error. These meta-descent strategies are
promising for non-stationary problems, but have not been as extensively
explored as quasi-second order methods. We first derive a general, incremental
meta-descent algorithm, called AdaGain, designed to be applicable to a much
broader range of algorithms, including those with semi-gradient updates or even
those with accelerations, such as RMSProp. We provide an empirical comparison
of methods from both families. We conclude that methods from both families can
perform well, but in non-stationary prediction problems the meta-descent
methods exhibit advantages. Our method is particularly robust across several
prediction problems, and is competitive with the state-of-the-art method on a
large-scale, time-series prediction problem on real data from a mobile robot.Comment: AAAI Conference on Artificial Intelligence 2019. v2: Correction to
Baird's counterexample. A bug in the code lead to results being reported for
AMSGrad in this experiment, when they were actually results for Ada
Continuous Learning in a Hierarchical Multiscale Neural Network
We reformulate the problem of encoding a multi-scale representation of a
sequence in a language model by casting it in a continuous learning framework.
We propose a hierarchical multi-scale language model in which short time-scale
dependencies are encoded in the hidden state of a lower-level recurrent neural
network while longer time-scale dependencies are encoded in the dynamic of the
lower-level network by having a meta-learner update the weights of the
lower-level neural network in an online meta-learning fashion. We use elastic
weights consolidation as a higher-level to prevent catastrophic forgetting in
our continuous learning framework.Comment: 5 pages, 2 figures, accepted as short paper at ACL 201
Online Learning of a Memory for Learning Rates
The promise of learning to learn for robotics rests on the hope that by
extracting some information about the learning process itself we can speed up
subsequent similar learning tasks. Here, we introduce a computationally
efficient online meta-learning algorithm that builds and optimizes a memory
model of the optimal learning rate landscape from previously observed gradient
behaviors. While performing task specific optimization, this memory of learning
rates predicts how to scale currently observed gradients. After applying the
gradient scaling our meta-learner updates its internal memory based on the
observed effect its prediction had. Our meta-learner can be combined with any
gradient-based optimizer, learns on the fly and can be transferred to new
optimization tasks. In our evaluations we show that our meta-learning algorithm
speeds up learning of MNIST classification and a variety of learning control
tasks, either in batch or online learning settings.Comment: accepted to ICRA 2018, code available:
https://github.com/fmeier/online-meta-learning ; video pitch available:
https://youtu.be/9PzQ25FPPO
- …