7 research outputs found
Overcoming Catastrophic Interference by Conceptors
Catastrophic interference has been a major roadblock in the research of
continual learning. Here we propose a variant of the back-propagation
algorithm, "conceptor-aided back-prop" (CAB), in which gradients are shielded
by conceptors against degradation of previously learned tasks. Conceptors have
their origin in reservoir computing, where they have been previously shown to
overcome catastrophic forgetting. CAB extends these results to deep feedforward
networks. On the disjoint MNIST task CAB outperforms two other methods for
coping with catastrophic interference that have recently been proposed in the
deep learning field.Comment: 14 pages, 4 figure
Progress & Compress: A scalable framework for continual learning
We introduce a conceptually simple and scalable framework for continual
learning domains where tasks are learned sequentially. Our method is constant
in the number of parameters and is designed to preserve performance on
previously encountered tasks while accelerating learning progress on subsequent
problems. This is achieved by training a network with two components: A
knowledge base, capable of solving previously encountered problems, which is
connected to an active column that is employed to efficiently learn the current
task. After learning a new task, the active column is distilled into the
knowledge base, taking care to protect any previously acquired skills. This
cycle of active learning (progression) followed by consolidation (compression)
requires no architecture growth, no access to or storing of previous data or
tasks, and no task-specific parameters. We demonstrate the progress & compress
approach on sequential classification of handwritten alphabets as well as two
reinforcement learning domains: Atari games and 3D maze navigation.Comment: Accepted at ICML 201
Disturbance-immune Weight Sharing for Neural Architecture Search
Neural architecture search (NAS) has gained increasing attention in the
community of architecture design. One of the key factors behind the success
lies in the training efficiency created by the weight sharing (WS) technique.
However, WS-based NAS methods often suffer from a performance disturbance (PD)
issue. That is, the training of subsequent architectures inevitably disturbs
the performance of previously trained architectures due to the partially shared
weights. This leads to inaccurate performance estimation for the previous
architectures, which makes it hard to learn a good search strategy. To
alleviate the performance disturbance issue, we propose a new
disturbance-immune update strategy for model updating. Specifically, to
preserve the knowledge learned by previous architectures, we constrain the
training of subsequent architectures in an orthogonal space via orthogonal
gradient descent. Equipped with this strategy, we propose a novel
disturbance-immune training scheme for NAS. We theoretically analyze the
effectiveness of our strategy in alleviating the PD risk. Extensive experiments
on CIFAR-10 and ImageNet verify the superiority of our method
Continual learning with hypernetworks
Artificial neural networks suffer from catastrophic forgetting when they are
sequentially trained on multiple tasks. To overcome this problem, we present a
novel approach based on task-conditioned hypernetworks, i.e., networks that
generate the weights of a target model based on task identity. Continual
learning (CL) is less difficult for this class of models thanks to a simple key
feature: instead of recalling the input-output relations of all previously seen
data, task-conditioned hypernetworks only require rehearsing task-specific
weight realizations, which can be maintained in memory using a simple
regularizer. Besides achieving state-of-the-art performance on standard CL
benchmarks, additional experiments on long task sequences reveal that
task-conditioned hypernetworks display a very large capacity to retain previous
memories. Notably, such long memory lifetimes are achieved in a compressive
regime, when the number of trainable hypernetwork weights is comparable or
smaller than target network size. We provide insight into the structure of
low-dimensional task embedding spaces (the input space of the hypernetwork) and
show that task-conditioned hypernetworks demonstrate transfer learning.
Finally, forward information transfer is further supported by empirical results
on a challenging CL benchmark based on the CIFAR-10/100 image datasets.Comment: Published at ICLR 202
Overcoming Multi-Model Forgetting
We identify a phenomenon, which we refer to as multi-model forgetting, that
occurs when sequentially training multiple deep networks with partially-shared
parameters; the performance of previously-trained models degrades as one
optimizes a subsequent one, due to the overwriting of shared parameters. To
overcome this, we introduce a statistically-justified weight plasticity loss
that regularizes the learning of a model's shared parameters according to their
importance for the previous models, and demonstrate its effectiveness when
training two models sequentially and for neural architecture search. Adding
weight plasticity in neural architecture search preserves the best models to
the end of the search and yields improved results in both natural language
processing and computer vision tasks
Alleviating catastrophic forgetting using context-dependent gating and synaptic stabilization
Humans and most animals can learn new tasks without forgetting old ones.
However, training artificial neural networks (ANNs) on new tasks typically
cause it to forget previously learned tasks. This phenomenon is the result of
"catastrophic forgetting", in which training an ANN disrupts connection weights
that were important for solving previous tasks, degrading task performance.
Several recent studies have proposed methods to stabilize connection weights of
ANNs that are deemed most important for solving a task, which helps alleviate
catastrophic forgetting. Here, drawing inspiration from algorithms that are
believed to be implemented in vivo, we propose a complementary method: adding a
context-dependent gating signal, such that only sparse, mostly non-overlapping
patterns of units are active for any one task. This method is easy to
implement, requires little computational overhead, and allows ANNs to maintain
high performance across large numbers of sequentially presented tasks when
combined with weight stabilization. This work provides another example of how
neuroscience-inspired algorithms can benefit ANN design and capability.Comment: Published in PNAS, https://www.pnas.org/content/115/44/E1046
Physical reservoir computing -- An introductory perspective
Understanding the fundamental relationships between physics and its
information-processing capability has been an active research topic for many
years. Physical reservoir computing is a recently introduced framework that
allows one to exploit the complex dynamics of physical systems as
information-processing devices. This framework is particularly suited for edge
computing devices, in which information processing is incorporated at the edge
(e.g., into sensors) in a decentralized manner to reduce the adaptation delay
caused by data transmission overhead. This paper aims to illustrate the
potentials of the framework using examples from soft robotics and to provide a
concise overview focusing on the basic motivations for introducing it, which
stem from a number of fields, including machine learning, nonlinear dynamical
systems, biological science, materials science, and physics.Comment: 18 pages, 8 figure