23 research outputs found
Online Structured Laplace Approximations For Overcoming Catastrophic Forgetting
We introduce the Kronecker factored online Laplace approximation for
overcoming catastrophic forgetting in neural networks. The method is grounded
in a Bayesian online learning framework, where we recursively approximate the
posterior after every task with a Gaussian, leading to a quadratic penalty on
changes to the weights. The Laplace approximation requires calculating the
Hessian around a mode, which is typically intractable for modern architectures.
In order to make our method scalable, we leverage recent block-diagonal
Kronecker factored approximations to the curvature. Our algorithm achieves over
90% test accuracy across a sequence of 50 instantiations of the permuted MNIST
dataset, substantially outperforming related methods for overcoming
catastrophic forgetting.Comment: 13 pages, 6 figure
Orthogonal Gradient Descent for Continual Learning
Neural networks are achieving state of the art and sometimes super-human performance on learning tasks across a variety of domains. Whenever these problems require learning in a continual or sequential manner, however, neural networks suffer from the problem of catastrophic forgetting; they forget how to solve previous tasks after being trained on a new task, despite having the essential capacity to solve both tasks if they were trained on both simultaneously. In this paper, we propose to address this issue from a parameter space perspective and study an approach to restrict the direction of the gradient updates to avoid forgetting previously-learned data. We present the Orthogonal Gradient Descent (OGD) method, which accomplishes this goal by projecting the gradients from new tasks onto a subspace in which the neural network output on previous task does not change and the projected gradient is still in a useful direction for learning the new task. Our approach utilizes the high capacity of a neural network more efficiently and does not require storing the previously learned data that might raise privacy concerns. Experiments on common benchmarks reveal the effectiveness of the proposed OGD method
Orthogonal Gradient Descent for Continual Learning
Neural networks are achieving state of the art and sometimes super-human
performance on learning tasks across a variety of domains. Whenever these
problems require learning in a continual or sequential manner, however, neural
networks suffer from the problem of catastrophic forgetting; they forget how to
solve previous tasks after being trained on a new task, despite having the
essential capacity to solve both tasks if they were trained on both
simultaneously. In this paper, we propose to address this issue from a
parameter space perspective and study an approach to restrict the direction of
the gradient updates to avoid forgetting previously-learned data. We present
the Orthogonal Gradient Descent (OGD) method, which accomplishes this goal by
projecting the gradients from new tasks onto a subspace in which the neural
network output on previous task does not change and the projected gradient is
still in a useful direction for learning the new task. Our approach utilizes
the high capacity of a neural network more efficiently and does not require
storing the previously learned data that might raise privacy concerns.
Experiments on common benchmarks reveal the effectiveness of the proposed OGD
method
Hypothesis-driven Online Video Stream Learning with Augmented Memory
The ability to continuously acquire new knowledge without forgetting previous
tasks remains a challenging problem for computer vision systems. Standard
continual learning benchmarks focus on learning from static iid images in an
offline setting. Here, we examine a more challenging and realistic online
continual learning problem called online stream learning. Like humans, some AI
agents have to learn incrementally from a continuous temporal stream of
non-repeating data. We propose a novel model, Hypotheses-driven Augmented
Memory Network (HAMN), which efficiently consolidates previous knowledge using
an augmented memory matrix of "hypotheses" and replays reconstructed image
features to avoid catastrophic forgetting. Compared with pixel-level and
generative replay approaches, the advantages of HAMN are two-fold. First,
hypothesis-based knowledge consolidation avoids redundant information in the
image pixel space and makes memory usage far more efficient. Second, hypotheses
in the augmented memory can be re-used for learning new tasks, improving
generalization and transfer learning ability. Given a lack of online
incremental class learning datasets on video streams, we introduce and adapt
two additional video datasets, Toybox and iLab, for online stream learning. We
also evaluate our method on the CORe50 and online CIFAR100 datasets. Our method
performs significantly better than all state-of-the-art methods, while offering
much more efficient memory usage. All source code and data are publicly
available at https://github.com/kreimanlab/AugMe