455 research outputs found
Powerpropagation: A sparsity inducing weight reparameterisation
The training of sparse neural networks is becoming an increasingly important tool
for reducing the computational footprint of models at training and evaluation, as
well enabling the effective scaling up of models. Whereas much work over the
years has been dedicated to specialised pruning techniques, little attention has
been paid to the inherent effect of gradient based training on model sparsity. In
this work, we introduce Powerpropagation, a new weight-parameterisation for
neural networks that leads to inherently sparse models. Exploiting the behaviour
of gradient descent, our method gives rise to weight updates exhibiting a “rich get
richer” dynamic, leaving low-magnitude parameters largely unaffected by learning.
Models trained in this manner exhibit similar performance, but have a distribution
with markedly higher density at zero, allowing more parameters to be pruned safely.
Powerpropagation is general, intuitive, cheap and straight-forward to implement
and can readily be combined with various other techniques. To highlight its versatility, we explore it in two very different settings: Firstly, following a recent
line of work, we investigate its effect on sparse training for resource-constrained
settings. Here, we combine Powerpropagation with a traditional weight-pruning
technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing
superior performance on the ImageNet benchmark. Secondly, we advocate the use
of sparsity in overcoming catastrophic forgetting, where compressed representations allow accommodating a large number of tasks at fixed model capacity. In all
cases our reparameterisation considerably increases the efficacy of the off-the-shelf
methods
Initial Classifier Weights Replay for Memoryless Class Incremental Learning
Incremental Learning (IL) is useful when artificial systems need to deal with
streams of data and do not have access to all data at all times. The most
challenging setting requires a constant complexity of the deep model and an
incremental model update without access to a bounded memory of past data. Then,
the representations of past classes are strongly affected by catastrophic
forgetting. To mitigate its negative effect, an adapted fine tuning which
includes knowledge distillation is usually deployed. We propose a different
approach based on a vanilla fine tuning backbone. It leverages initial
classifier weights which provide a strong representation of past classes
because they are trained with all class data. However, the magnitude of
classifiers learned in different states varies and normalization is needed for
a fair handling of all classes. Normalization is performed by standardizing the
initial classifier weights, which are assumed to be normally distributed. In
addition, a calibration of prediction scores is done by using state level
statistics to further improve classification fairness. We conduct a thorough
evaluation with four public datasets in a memoryless incremental learning
setting. Results show that our method outperforms existing techniques by a
large margin for large-scale datasets.Comment: Accepted in BMVC202
Diffusion-based neuromodulation can eliminate catastrophic forgetting in simple neural networks
A long-term goal of AI is to produce agents that can learn a diversity of
skills throughout their lifetimes and continuously improve those skills via
experience. A longstanding obstacle towards that goal is catastrophic
forgetting, which is when learning new information erases previously learned
information. Catastrophic forgetting occurs in artificial neural networks
(ANNs), which have fueled most recent advances in AI. A recent paper proposed
that catastrophic forgetting in ANNs can be reduced by promoting modularity,
which can limit forgetting by isolating task information to specific clusters
of nodes and connections (functional modules). While the prior work did show
that modular ANNs suffered less from catastrophic forgetting, it was not able
to produce ANNs that possessed task-specific functional modules, thereby
leaving the main theory regarding modularity and forgetting untested. We
introduce diffusion-based neuromodulation, which simulates the release of
diffusing, neuromodulatory chemicals within an ANN that can modulate (i.e. up
or down regulate) learning in a spatial region. On the simple diagnostic
problem from the prior work, diffusion-based neuromodulation 1) induces
task-specific learning in groups of nodes and connections (task-specific
localized learning), which 2) produces functional modules for each subtask, and
3) yields higher performance by eliminating catastrophic forgetting. Overall,
our results suggest that diffusion-based neuromodulation promotes task-specific
localized learning and functional modularity, which can help solve the
challenging, but important problem of catastrophic forgetting
Adaptive Reorganization of Neural Pathways for Continual Learning with Spiking Neural Networks
The human brain can self-organize rich and diverse sparse neural pathways to
incrementally master hundreds of cognitive tasks. However, most existing
continual learning algorithms for deep artificial and spiking neural networks
are unable to adequately auto-regulate the limited resources in the network,
which leads to performance drop along with energy consumption rise as the
increase of tasks. In this paper, we propose a brain-inspired continual
learning algorithm with adaptive reorganization of neural pathways, which
employs Self-Organizing Regulation networks to reorganize the single and
limited Spiking Neural Network (SOR-SNN) into rich sparse neural pathways to
efficiently cope with incremental tasks. The proposed model demonstrates
consistent superiority in performance, energy consumption, and memory capacity
on diverse continual learning tasks ranging from child-like simple to complex
tasks, as well as on generalized CIFAR100 and ImageNet datasets. In particular,
the SOR-SNN model excels at learning more complex tasks as well as more tasks,
and is able to integrate the past learned knowledge with the information from
the current task, showing the backward transfer ability to facilitate the old
tasks. Meanwhile, the proposed model exhibits self-repairing ability to
irreversible damage and for pruned networks, could automatically allocate new
pathway from the retained network to recover memory for forgotten knowledge
Recent Advances of Continual Learning in Computer Vision: An Overview
In contrast to batch learning where all training data is available at once,
continual learning represents a family of methods that accumulate knowledge and
learn continuously with data available in sequential order. Similar to the
human learning process with the ability of learning, fusing, and accumulating
new knowledge coming at different time steps, continual learning is considered
to have high practical significance. Hence, continual learning has been studied
in various artificial intelligence tasks. In this paper, we present a
comprehensive review of the recent progress of continual learning in computer
vision. In particular, the works are grouped by their representative
techniques, including regularization, knowledge distillation, memory,
generative replay, parameter isolation, and a combination of the above
techniques. For each category of these techniques, both its characteristics and
applications in computer vision are presented. At the end of this overview,
several subareas, where continuous knowledge accumulation is potentially
helpful while continual learning has not been well studied, are discussed
A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning
Forgetting refers to the loss or deterioration of previously acquired
information or knowledge. While the existing surveys on forgetting have
primarily focused on continual learning, forgetting is a prevalent phenomenon
observed in various other research domains within deep learning. Forgetting
manifests in research fields such as generative models due to generator shifts,
and federated learning due to heterogeneous data distributions across clients.
Addressing forgetting encompasses several challenges, including balancing the
retention of old task knowledge with fast learning of new tasks, managing task
interference with conflicting goals, and preventing privacy leakage, etc.
Moreover, most existing surveys on continual learning implicitly assume that
forgetting is always harmful. In contrast, our survey argues that forgetting is
a double-edged sword and can be beneficial and desirable in certain cases, such
as privacy-preserving scenarios. By exploring forgetting in a broader context,
we aim to present a more nuanced understanding of this phenomenon and highlight
its potential advantages. Through this comprehensive survey, we aspire to
uncover potential solutions by drawing upon ideas and approaches from various
fields that have dealt with forgetting. By examining forgetting beyond its
conventional boundaries, in future work, we hope to encourage the development
of novel strategies for mitigating, harnessing, or even embracing forgetting in
real applications. A comprehensive list of papers about forgetting in various
research fields is available at
\url{https://github.com/EnnengYang/Awesome-Forgetting-in-Deep-Learning}
Class-Incremental Learning using Diffusion Model for Distillation and Replay
Class-incremental learning aims to learn new classes in an incremental
fashion without forgetting the previously learned ones. Several research works
have shown how additional data can be used by incremental models to help
mitigate catastrophic forgetting. In this work, following the recent
breakthrough in text-to-image generative models and their wide distribution, we
propose the use of a pretrained Stable Diffusion model as a source of
additional data for class-incremental learning. Compared to competitive methods
that rely on external, often unlabeled, datasets of real images, our approach
can generate synthetic samples belonging to the same classes as the previously
encountered images. This allows us to use those additional data samples not
only in the distillation loss but also for replay in the classification loss.
Experiments on the competitive benchmarks CIFAR100, ImageNet-Subset, and
ImageNet demonstrate how this new approach can be used to further improve the
performance of state-of-the-art methods for class-incremental learning on large
scale datasets.Comment: Best paper award at 1st Workshop on Visual Continual Learning, ICCV
202
- …