12 research outputs found
Class Gradient Projection For Continual Learning
Catastrophic forgetting is one of the most critical challenges in Continual
Learning (CL). Recent approaches tackle this problem by projecting the gradient
update orthogonal to the gradient subspace of existing tasks. While the results
are remarkable, those approaches ignore the fact that these calculated
gradients are not guaranteed to be orthogonal to the gradient subspace of each
class due to the class deviation in tasks, e.g., distinguishing "Man" from
"Sea" v.s. differentiating "Boy" from "Girl". Therefore, this strategy may
still cause catastrophic forgetting for some classes. In this paper, we propose
Class Gradient Projection (CGP), which calculates the gradient subspace from
individual classes rather than tasks. Gradient update orthogonal to the
gradient subspace of existing classes can be effectively utilized to minimize
interference from other classes. To improve the generalization and efficiency,
we further design a Base Refining (BR) algorithm to combine similar classes and
refine class bases dynamically. Moreover, we leverage a contrastive learning
method to improve the model's ability to handle unseen tasks. Extensive
experiments on benchmark datasets demonstrate the effectiveness of our proposed
approach. It improves the previous methods by 2.0% on the CIFAR-100 dataset.Comment: MM '22: Proceedings of the 30th ACM International Conference on
Multimedi
Energy-Based Models for Continual Learning
We motivate Energy-Based Models (EBMs) as a promising model class for
continual learning problems. Instead of tackling continual learning via the use
of external memory, growing models, or regularization, EBMs have a natural way
to support a dynamically-growing number of tasks or classes that causes less
interference with previously learned information. Our proposed version of EBMs
for continual learning is simple, efficient and outperforms baseline methods by
a large margin on several benchmarks. Moreover, our proposed contrastive
divergence based training objective can be applied to other continual learning
methods, resulting in substantial boosts in their performance. We also show
that EBMs are adaptable to a more general continual learning setting where the
data distribution changes without the notion of explicitly delineated tasks.
These observations point towards EBMs as a class of models naturally inclined
towards the continual learning regime
Loss of Plasticity in Deep Continual Learning
Modern deep-learning systems are specialized to problem settings in which
training occurs once and then never again, as opposed to continual-learning
settings in which training occurs continually. If deep-learning systems are
applied in a continual learning setting, then it is well known that they may
fail to remember earlier examples. More fundamental, but less well known, is
that they may also lose their ability to learn on new examples, a phenomenon
called loss of plasticity. We provide direct demonstrations of loss of
plasticity using the MNIST and ImageNet datasets repurposed for continual
learning as sequences of tasks. In ImageNet, binary classification performance
dropped from 89\% accuracy on an early task down to 77\%, about the level of a
linear network, on the 2000th task. Loss of plasticity occurred with a wide
range of deep network architectures, optimizers, activation functions, batch
normalization, dropout, but was substantially eased by -regularization,
particularly when combined with weight perturbation. Further, we introduce a
new algorithm -- continual backpropagation -- which slightly modifies
conventional backpropagation to reinitialize a small fraction of less-used
units after each example and appears to maintain plasticity indefinitely
Reducing Catastrophic Forgetting in Self-Organizing Maps
An agent that is capable of continual or lifelong learning is able to continuously learn from potentially infinite streams of pattern sensory data. One major historic difficulty in building agents capable of such learning is that neural systems struggle to retain previously-acquired knowledge when learning from new data samples. This problem is known as catastrophic forgetting and remains an unsolved problem in the domain of machine learning to this day. To overcome catastrophic forgetting, different approaches have been proposed. One major line of thought advocates the use of memory buffers to store data where the stored data is then used to randomly retrain the model to improve memory retention. However, storing and giving access to previous physical data points results in a variety of practical difficulties particularly with respect to growing memory storage costs. In this work, we propose an alternative way to tackle the problem of catastrophic forgetting, inspired by and building on top of a classical neural model, the self-organizing map (SOM) which is a form of unsupervised clustering. Although the SOM has the potential to combat forgetting through the use of pattern-specializing units, we uncover that it too suffers from the same problem and this forgetting becomes worse when the SOM is trained in a task incremental fashion. To mitigate this, we propose a generalization of the SOM, the continual SOM (c-SOM), which introduces several novel mechanisms to improve its memory retention -- new decay functions and generative resampling schemes to facilitate generative replay in the model. We perform extensive experiments using split-MNIST with these approaches, demonstrating that the c-SOM significantly improves over the classical SOM. Additionally, we come up with a new performance metric alpha_mem to measure the efficacy of SOMs trained in a task incremental fashion, providing a benchmark for other competitive learning models
Class Incremental Learning in Deep Neural Networks
With the advancement of computation capability, in particular the use of graphical processing units, deep learning systems have shown tremendous potential in achieving super-human performance in many computer vision tasks. However, deep learning models are not able to learn continuously in scenarios where the data distribution is non-stationary or imbalanced, because the models suffer from catastrophic forgetting. In this thesis, we propose an Incremental Generative Replay Embedding (IGRE) framework which employs a conditional generator for generative replay at the image embedding level, thus combining the superior performance of replay and reducing the memory complexities for replay at the same time. Alternating backpropagation with Langevin's dynamics was used for efficient and effective training of the conditional generator. We evaluate the proposed IGRE framework on common benchmarks using CIFAR10/100, CUB and ImageNet datasets. Results show that the proposed IGRE framework outperforms state-of-the-art methods on CIFAR-10, CIFAR-100, and the CUB datasets with 6-9\% improvement in accuracy and achieves comparable performance in large-scale ImageNet experiments, while at the same time reducing the memory requirements significantly when compared to conventional replay techniques