12 research outputs found

    Meta-Aggregating Networks for Class-Incremental Learning

    Get PDF

    Class Gradient Projection For Continual Learning

    Full text link
    Catastrophic forgetting is one of the most critical challenges in Continual Learning (CL). Recent approaches tackle this problem by projecting the gradient update orthogonal to the gradient subspace of existing tasks. While the results are remarkable, those approaches ignore the fact that these calculated gradients are not guaranteed to be orthogonal to the gradient subspace of each class due to the class deviation in tasks, e.g., distinguishing "Man" from "Sea" v.s. differentiating "Boy" from "Girl". Therefore, this strategy may still cause catastrophic forgetting for some classes. In this paper, we propose Class Gradient Projection (CGP), which calculates the gradient subspace from individual classes rather than tasks. Gradient update orthogonal to the gradient subspace of existing classes can be effectively utilized to minimize interference from other classes. To improve the generalization and efficiency, we further design a Base Refining (BR) algorithm to combine similar classes and refine class bases dynamically. Moreover, we leverage a contrastive learning method to improve the model's ability to handle unseen tasks. Extensive experiments on benchmark datasets demonstrate the effectiveness of our proposed approach. It improves the previous methods by 2.0% on the CIFAR-100 dataset.Comment: MM '22: Proceedings of the 30th ACM International Conference on Multimedi

    Energy-Based Models for Continual Learning

    Full text link
    We motivate Energy-Based Models (EBMs) as a promising model class for continual learning problems. Instead of tackling continual learning via the use of external memory, growing models, or regularization, EBMs have a natural way to support a dynamically-growing number of tasks or classes that causes less interference with previously learned information. Our proposed version of EBMs for continual learning is simple, efficient and outperforms baseline methods by a large margin on several benchmarks. Moreover, our proposed contrastive divergence based training objective can be applied to other continual learning methods, resulting in substantial boosts in their performance. We also show that EBMs are adaptable to a more general continual learning setting where the data distribution changes without the notion of explicitly delineated tasks. These observations point towards EBMs as a class of models naturally inclined towards the continual learning regime

    Loss of Plasticity in Deep Continual Learning

    Full text link
    Modern deep-learning systems are specialized to problem settings in which training occurs once and then never again, as opposed to continual-learning settings in which training occurs continually. If deep-learning systems are applied in a continual learning setting, then it is well known that they may fail to remember earlier examples. More fundamental, but less well known, is that they may also lose their ability to learn on new examples, a phenomenon called loss of plasticity. We provide direct demonstrations of loss of plasticity using the MNIST and ImageNet datasets repurposed for continual learning as sequences of tasks. In ImageNet, binary classification performance dropped from 89\% accuracy on an early task down to 77\%, about the level of a linear network, on the 2000th task. Loss of plasticity occurred with a wide range of deep network architectures, optimizers, activation functions, batch normalization, dropout, but was substantially eased by L2L^2-regularization, particularly when combined with weight perturbation. Further, we introduce a new algorithm -- continual backpropagation -- which slightly modifies conventional backpropagation to reinitialize a small fraction of less-used units after each example and appears to maintain plasticity indefinitely

    Reducing Catastrophic Forgetting in Self-Organizing Maps

    Get PDF
    An agent that is capable of continual or lifelong learning is able to continuously learn from potentially infinite streams of pattern sensory data. One major historic difficulty in building agents capable of such learning is that neural systems struggle to retain previously-acquired knowledge when learning from new data samples. This problem is known as catastrophic forgetting and remains an unsolved problem in the domain of machine learning to this day. To overcome catastrophic forgetting, different approaches have been proposed. One major line of thought advocates the use of memory buffers to store data where the stored data is then used to randomly retrain the model to improve memory retention. However, storing and giving access to previous physical data points results in a variety of practical difficulties particularly with respect to growing memory storage costs. In this work, we propose an alternative way to tackle the problem of catastrophic forgetting, inspired by and building on top of a classical neural model, the self-organizing map (SOM) which is a form of unsupervised clustering. Although the SOM has the potential to combat forgetting through the use of pattern-specializing units, we uncover that it too suffers from the same problem and this forgetting becomes worse when the SOM is trained in a task incremental fashion. To mitigate this, we propose a generalization of the SOM, the continual SOM (c-SOM), which introduces several novel mechanisms to improve its memory retention -- new decay functions and generative resampling schemes to facilitate generative replay in the model. We perform extensive experiments using split-MNIST with these approaches, demonstrating that the c-SOM significantly improves over the classical SOM. Additionally, we come up with a new performance metric alpha_mem to measure the efficacy of SOMs trained in a task incremental fashion, providing a benchmark for other competitive learning models

    Class Incremental Learning in Deep Neural Networks

    Get PDF
    With the advancement of computation capability, in particular the use of graphical processing units, deep learning systems have shown tremendous potential in achieving super-human performance in many computer vision tasks. However, deep learning models are not able to learn continuously in scenarios where the data distribution is non-stationary or imbalanced, because the models suffer from catastrophic forgetting. In this thesis, we propose an Incremental Generative Replay Embedding (IGRE) framework which employs a conditional generator for generative replay at the image embedding level, thus combining the superior performance of replay and reducing the memory complexities for replay at the same time. Alternating backpropagation with Langevin's dynamics was used for efficient and effective training of the conditional generator. We evaluate the proposed IGRE framework on common benchmarks using CIFAR10/100, CUB and ImageNet datasets. Results show that the proposed IGRE framework outperforms state-of-the-art methods on CIFAR-10, CIFAR-100, and the CUB datasets with 6-9\% improvement in accuracy and achieves comparable performance in large-scale ImageNet experiments, while at the same time reducing the memory requirements significantly when compared to conventional replay techniques