23 research outputs found

    Online Structured Laplace Approximations For Overcoming Catastrophic Forgetting

    Get PDF
    We introduce the Kronecker factored online Laplace approximation for overcoming catastrophic forgetting in neural networks. The method is grounded in a Bayesian online learning framework, where we recursively approximate the posterior after every task with a Gaussian, leading to a quadratic penalty on changes to the weights. The Laplace approximation requires calculating the Hessian around a mode, which is typically intractable for modern architectures. In order to make our method scalable, we leverage recent block-diagonal Kronecker factored approximations to the curvature. Our algorithm achieves over 90% test accuracy across a sequence of 50 instantiations of the permuted MNIST dataset, substantially outperforming related methods for overcoming catastrophic forgetting.Comment: 13 pages, 6 figure

    Orthogonal Gradient Descent for Continual Learning

    Get PDF
    Neural networks are achieving state of the art and sometimes super-human performance on learning tasks across a variety of domains. Whenever these problems require learning in a continual or sequential manner, however, neural networks suffer from the problem of catastrophic forgetting; they forget how to solve previous tasks after being trained on a new task, despite having the essential capacity to solve both tasks if they were trained on both simultaneously. In this paper, we propose to address this issue from a parameter space perspective and study an approach to restrict the direction of the gradient updates to avoid forgetting previously-learned data. We present the Orthogonal Gradient Descent (OGD) method, which accomplishes this goal by projecting the gradients from new tasks onto a subspace in which the neural network output on previous task does not change and the projected gradient is still in a useful direction for learning the new task. Our approach utilizes the high capacity of a neural network more efficiently and does not require storing the previously learned data that might raise privacy concerns. Experiments on common benchmarks reveal the effectiveness of the proposed OGD method

    Orthogonal Gradient Descent for Continual Learning

    Get PDF
    Neural networks are achieving state of the art and sometimes super-human performance on learning tasks across a variety of domains. Whenever these problems require learning in a continual or sequential manner, however, neural networks suffer from the problem of catastrophic forgetting; they forget how to solve previous tasks after being trained on a new task, despite having the essential capacity to solve both tasks if they were trained on both simultaneously. In this paper, we propose to address this issue from a parameter space perspective and study an approach to restrict the direction of the gradient updates to avoid forgetting previously-learned data. We present the Orthogonal Gradient Descent (OGD) method, which accomplishes this goal by projecting the gradients from new tasks onto a subspace in which the neural network output on previous task does not change and the projected gradient is still in a useful direction for learning the new task. Our approach utilizes the high capacity of a neural network more efficiently and does not require storing the previously learned data that might raise privacy concerns. Experiments on common benchmarks reveal the effectiveness of the proposed OGD method

    Hypothesis-driven Online Video Stream Learning with Augmented Memory

    Full text link
    The ability to continuously acquire new knowledge without forgetting previous tasks remains a challenging problem for computer vision systems. Standard continual learning benchmarks focus on learning from static iid images in an offline setting. Here, we examine a more challenging and realistic online continual learning problem called online stream learning. Like humans, some AI agents have to learn incrementally from a continuous temporal stream of non-repeating data. We propose a novel model, Hypotheses-driven Augmented Memory Network (HAMN), which efficiently consolidates previous knowledge using an augmented memory matrix of "hypotheses" and replays reconstructed image features to avoid catastrophic forgetting. Compared with pixel-level and generative replay approaches, the advantages of HAMN are two-fold. First, hypothesis-based knowledge consolidation avoids redundant information in the image pixel space and makes memory usage far more efficient. Second, hypotheses in the augmented memory can be re-used for learning new tasks, improving generalization and transfer learning ability. Given a lack of online incremental class learning datasets on video streams, we introduce and adapt two additional video datasets, Toybox and iLab, for online stream learning. We also evaluate our method on the CORe50 and online CIFAR100 datasets. Our method performs significantly better than all state-of-the-art methods, while offering much more efficient memory usage. All source code and data are publicly available at https://github.com/kreimanlab/AugMe
    corecore