2,564 research outputs found
Investigating Continual Learning Strategies in Neural Networks
This paper explores the role of continual learning strategies when neural networks are confronted with learning tasks sequentially. We analyze the stability-plasticity dilemma with three factors in mind: the type of network architecture used, the continual learning scenario defined and the continual learning strategy implemented. Our results show that complementary learning systems and neural volume significantly contribute towards memory retrieval and consolidation in neural networks. Finally, we demonstrate how regularization strategies such as elastic weight consolidation are more well-suited for larger neural networks whereas rehearsal strategies such as gradient episodic memory are better suited for smaller neural networks
Using Hindsight to Anchor Past Knowledge in Continual Learning
In continual learning, the learner faces a stream of data whose distribution
changes over time. Modern neural networks are known to suffer under this
setting, as they quickly forget previously acquired knowledge. To address such
catastrophic forgetting, many continual learning methods implement different
types of experience replay, re-learning on past data stored in a small buffer
known as episodic memory. In this work, we complement experience replay with a
new objective that we call anchoring, where the learner uses bilevel
optimization to update its knowledge on the current task, while keeping intact
the predictions on some anchor points of past tasks. These anchor points are
learned using gradient-based optimization to maximize forgetting, which is
approximated by fine-tuning the currently trained model on the episodic memory
of past tasks. Experiments on several supervised learning benchmarks for
continual learning demonstrate that our approach improves the standard
experience replay in terms of both accuracy and forgetting metrics and for
various sizes of episodic memories.Comment: Accepted at AAAI 202
Online Continual Learning of End-to-End Speech Recognition Models
Continual Learning, also known as Lifelong Learning, aims to continually
learn from new data as it becomes available. While prior research on continual
learning in automatic speech recognition has focused on the adaptation of
models across multiple different speech recognition tasks, in this paper we
propose an experimental setting for \textit{online continual learning} for
automatic speech recognition of a single task. Specifically focusing on the
case where additional training data for the same task becomes available
incrementally over time, we demonstrate the effectiveness of performing
incremental model updates to end-to-end speech recognition models with an
online Gradient Episodic Memory (GEM) method. Moreover, we show that with
online continual learning and a selective sampling strategy, we can maintain an
accuracy that is similar to retraining a model from scratch while requiring
significantly lower computation costs. We have also verified our method with
self-supervised learning (SSL) features.Comment: Accepted at InterSpeech 202
Scalable Recollections for Continual Lifelong Learning
Given the recent success of Deep Learning applied to a variety of single
tasks, it is natural to consider more human-realistic settings. Perhaps the
most difficult of these settings is that of continual lifelong learning, where
the model must learn online over a continuous stream of non-stationary data. A
successful continual lifelong learning system must have three key capabilities:
it must learn and adapt over time, it must not forget what it has learned, and
it must be efficient in both training time and memory. Recent techniques have
focused their efforts primarily on the first two capabilities while questions
of efficiency remain largely unexplored. In this paper, we consider the problem
of efficient and effective storage of experiences over very large time-frames.
In particular we consider the case where typical experiences are O(n) bits and
memories are limited to O(k) bits for k << n. We present a novel scalable
architecture and training algorithm in this challenging domain and provide an
extensive evaluation of its performance. Our results show that we can achieve
considerable gains on top of state-of-the-art methods such as GEM.Comment: AAAI 201
- …