4,224 research outputs found
Online Learning of a Memory for Learning Rates
The promise of learning to learn for robotics rests on the hope that by
extracting some information about the learning process itself we can speed up
subsequent similar learning tasks. Here, we introduce a computationally
efficient online meta-learning algorithm that builds and optimizes a memory
model of the optimal learning rate landscape from previously observed gradient
behaviors. While performing task specific optimization, this memory of learning
rates predicts how to scale currently observed gradients. After applying the
gradient scaling our meta-learner updates its internal memory based on the
observed effect its prediction had. Our meta-learner can be combined with any
gradient-based optimizer, learns on the fly and can be transferred to new
optimization tasks. In our evaluations we show that our meta-learning algorithm
speeds up learning of MNIST classification and a variety of learning control
tasks, either in batch or online learning settings.Comment: accepted to ICRA 2018, code available:
https://github.com/fmeier/online-meta-learning ; video pitch available:
https://youtu.be/9PzQ25FPPO
Memory-Efficient Adaptive Optimization
Adaptive gradient-based optimizers such as Adagrad and Adam are crucial for
achieving state-of-the-art performance in machine translation and language
modeling. However, these methods maintain second-order statistics for each
parameter, thus introducing significant memory overheads that restrict the size
of the model being used as well as the number of examples in a mini-batch. We
describe an effective and flexible adaptive optimization method with greatly
reduced memory overhead. Our method retains the benefits of per-parameter
adaptivity while allowing significantly larger models and batch sizes. We give
convergence guarantees for our method, and demonstrate its effectiveness in
training very large translation and language models with up to 2-fold speedups
compared to the state-of-the-art
- …