30 research outputs found
SpaceNet: Make Free Space For Continual Learning
The continual learning (CL) paradigm aims to enable neural networks to learn
tasks continually in a sequential fashion. The fundamental challenge in this
learning paradigm is catastrophic forgetting previously learned tasks when the
model is optimized for a new task, especially when their data is not
accessible. Current architectural-based methods aim at alleviating the
catastrophic forgetting problem but at the expense of expanding the capacity of
the model. Regularization-based methods maintain a fixed model capacity;
however, previous studies showed the huge performance degradation of these
methods when the task identity is not available during inference (e.g. class
incremental learning scenario). In this work, we propose a novel
architectural-based method referred as SpaceNet for class incremental learning
scenario where we utilize the available fixed capacity of the model
intelligently. SpaceNet trains sparse deep neural networks from scratch in an
adaptive way that compresses the sparse connections of each task in a compact
number of neurons. The adaptive training of the sparse connections results in
sparse representations that reduce the interference between the tasks.
Experimental results show the robustness of our proposed method against
catastrophic forgetting old tasks and the efficiency of SpaceNet in utilizing
the available capacity of the model, leaving space for more tasks to be
learned. In particular, when SpaceNet is tested on the well-known benchmarks
for CL: split MNIST, split Fashion-MNIST, and CIFAR-10/100, it outperforms
regularization-based methods by a big performance gap. Moreover, it achieves
better performance than architectural-based methods without model expansion and
achieved comparable results with rehearsal-based methods, while offering a huge
memory reduction.Comment: Accepted in Neurocomputing Journa
Self-Attention Meta-Learner for Continual Learning
Continual learning aims to provide intelligent agents capable of learning
multiple tasks sequentially with neural networks. One of its main challenging,
catastrophic forgetting, is caused by the neural networks non-optimal ability
to learn in non-stationary distributions. In most settings of the current
approaches, the agent starts from randomly initialized parameters and is
optimized to master the current task regardless of the usefulness of the
learned representation for future tasks. Moreover, each of the future tasks
uses all the previously learned knowledge although parts of this knowledge
might not be helpful for its learning. These cause interference among tasks,
especially when the data of previous tasks is not accessible. In this paper, we
propose a new method, named Self-Attention Meta-Learner (SAM), which learns a
prior knowledge for continual learning that permits learning a sequence of
tasks, while avoiding catastrophic forgetting. SAM incorporates an attention
mechanism that learns to select the particular relevant representation for each
future task. Each task builds a specific representation branch on top of the
selected knowledge, avoiding the interference between tasks. We evaluate the
proposed method on the Split CIFAR-10/100 and Split MNIST benchmarks in the
task agnostic inference. We empirically show that we can achieve a better
performance than several state-of-the-art methods for continual learning by
building on the top of selected representation learned by SAM. We also show the
role of the meta-attention mechanism in boosting informative features
corresponding to the input data and identifying the correct target in the task
agnostic inference. Finally, we demonstrate that popular existing continual
learning methods gain a performance boost when they adopt SAM as a starting
point
Avoiding Forgetting and Allowing Forward Transfer in Continual Learning via Sparse Networks
Using task-specific components within a neural network in continual learning
(CL) is a compelling strategy to address the stability-plasticity dilemma in
fixed-capacity models without access to past data. Current methods focus only
on selecting a sub-network for a new task that reduces forgetting of past
tasks. However, this selection could limit the forward transfer of relevant
past knowledge that helps in future learning. Our study reveals that satisfying
both objectives jointly is more challenging when a unified classifier is used
for all classes of seen tasks-class-Incremental Learning (class-IL)-as it is
prone to ambiguities between classes across tasks. Moreover, the challenge
increases when the semantic similarity of classes across tasks increases. To
address this challenge, we propose a new CL method, named AFAF, that aims to
Avoid Forgetting and Allow Forward transfer in class-IL using fix-capacity
models. AFAF allocates a sub-network that enables selective transfer of
relevant knowledge to a new task while preserving past knowledge, reusing some
of the previously allocated components to utilize the fixed-capacity, and
addressing class-ambiguities when similarities exist. The experiments show the
effectiveness of AFAF in providing models with multiple CL desirable
properties, while outperforming state-of-the-art methods on various challenging
benchmarks with different semantic similarities.Comment: Accepted at European Conference on Machine Learning and Principles
and Practice of Knowledge Discovery in Databases (ECML PKDD 2022
The Dormant Neuron Phenomenon in Deep Reinforcement Learning
In this work we identify the dormant neuron phenomenon in deep reinforcement
learning, where an agent's network suffers from an increasing number of
inactive neurons, thereby affecting network expressivity. We demonstrate the
presence of this phenomenon across a variety of algorithms and environments,
and highlight its effect on learning. To address this issue, we propose a
simple and effective method (ReDo) that Recycles Dormant neurons throughout
training. Our experiments demonstrate that ReDo maintains the expressive power
of networks by reducing the number of dormant neurons and results in improved
performance.Comment: Oral at ICML 202
Continual Learning with Dynamic Sparse Training: Exploring Algorithms for Effective Model Updates
Continual learning (CL) refers to the ability of an intelligent system to
sequentially acquire and retain knowledge from a stream of data with as little
computational overhead as possible. To this end; regularization, replay,
architecture, and parameter isolation approaches were introduced to the
literature. Parameter isolation using a sparse network which enables to
allocate distinct parts of the neural network to different tasks and also
allows to share of parameters between tasks if they are similar. Dynamic Sparse
Training (DST) is a prominent way to find these sparse networks and isolate
them for each task. This paper is the first empirical study investigating the
effect of different DST components under the CL paradigm to fill a critical
research gap and shed light on the optimal configuration of DST for CL if it
exists. Therefore, we perform a comprehensive study in which we investigate
various DST components to find the best topology per task on well-known
CIFAR100 and miniImageNet benchmarks in a task-incremental CL setup since our
primary focus is to evaluate the performance of various DST criteria, rather
than the process of mask selection. We found that, at a low sparsity level,
Erdos-Renyi Kernel (ERK) initialization utilizes the backbone more efficiently
and allows to effectively learn increments of tasks. At a high sparsity level,
however, uniform initialization demonstrates more reliable and robust
performance. In terms of growth strategy; performance is dependent on the
defined initialization strategy, and the extent of sparsity. Finally,
adaptivity within DST components is a promising way for better continual
learners
Automatic Noise Filtering with Dynamic Sparse Training in Deep Reinforcement Learning
Tomorrow's robots will need to distinguish useful information from noise when performing different tasks. A household robot for instance may continuously receive a plethora of information about the home, but needs to focus on just a small subset to successfully execute its current chore. Filtering distracting inputs that contain irrelevant data has received little attention in the reinforcement learning literature. To start resolving this, we formulate a problem setting in reinforcement learning called the (ENE), where up to of the input features are pure noise. Agents need to detect which features provide task-relevant information about the state of the environment. Consequently, we propose a new method termed (ANF), which uses the principles of dynamic sparse training in synergy with various deep reinforcement learning algorithms. The sparse input layer learns to focus its connectivity on task-relevant features, such that ANF-SAC and ANF-TD3 outperform standard SAC and TD3 by a large margin, while using up to fewer weights. Furthermore, we devise a transfer learning setting for ENEs, by permuting all features of the environment after 1M timesteps to simulate the fact that other information sources can become relevant as the world evolves. Again, ANF surpasses the baselines in final performance and sample complexity. Our code is available at https://github.com/bramgrooten/automatic-noise-filterin
Quick and Robust Feature Selection: the Strength of Energy-efficient Sparse Training for Autoencoders
Major complications arise from the recent increase in the amount of
high-dimensional data, including high computational costs and memory
requirements. Feature selection, which identifies the most relevant and
informative attributes of a dataset, has been introduced as a solution to this
problem. Most of the existing feature selection methods are computationally
inefficient; inefficient algorithms lead to high energy consumption, which is
not desirable for devices with limited computational and energy resources. In
this paper, a novel and flexible method for unsupervised feature selection is
proposed. This method, named QuickSelection, introduces the strength of the
neuron in sparse neural networks as a criterion to measure the feature
importance. This criterion, blended with sparsely connected denoising
autoencoders trained with the sparse evolutionary training procedure, derives
the importance of all input features simultaneously. We implement
QuickSelection in a purely sparse manner as opposed to the typical approach of
using a binary mask over connections to simulate sparsity. It results in a
considerable speed increase and memory reduction. When tested on several
benchmark datasets, including five low-dimensional and three high-dimensional
datasets, the proposed method is able to achieve the best trade-off of
classification and clustering accuracy, running time, and maximum memory usage,
among widely used approaches for feature selection. Besides, our proposed
method requires the least amount of energy among the state-of-the-art
autoencoder-based feature selection methods.Comment: 29 page