29,048 research outputs found
Overcoming Multi-Model Forgetting
We identify a phenomenon, which we refer to as multi-model forgetting, that
occurs when sequentially training multiple deep networks with partially-shared
parameters; the performance of previously-trained models degrades as one
optimizes a subsequent one, due to the overwriting of shared parameters. To
overcome this, we introduce a statistically-justified weight plasticity loss
that regularizes the learning of a model's shared parameters according to their
importance for the previous models, and demonstrate its effectiveness when
training two models sequentially and for neural architecture search. Adding
weight plasticity in neural architecture search preserves the best models to
the end of the search and yields improved results in both natural language
processing and computer vision tasks
Learn to Grow: A Continual Structure Learning Framework for Overcoming Catastrophic Forgetting
Addressing catastrophic forgetting is one of the key challenges in continual
learning where machine learning systems are trained with sequential or
streaming tasks. Despite recent remarkable progress in state-of-the-art deep
learning, deep neural networks (DNNs) are still plagued with the catastrophic
forgetting problem. This paper presents a conceptually simple yet general and
effective framework for handling catastrophic forgetting in continual learning
with DNNs. The proposed method consists of two components: a neural structure
optimization component and a parameter learning and/or fine-tuning component.
By separating the explicit neural structure learning and the parameter
estimation, not only is the proposed method capable of evolving neural
structures in an intuitively meaningful way, but also shows strong capabilities
of alleviating catastrophic forgetting in experiments. Furthermore, the
proposed method outperforms all other baselines on the permuted MNIST dataset,
the split CIFAR100 dataset and the Visual Domain Decathlon dataset in continual
learning setting
Reducing catastrophic forgetting when evolving neural networks
A key stepping stone in the development of an artificial general intelligence
(a machine that can perform any task), is the production of agents that can
perform multiple tasks at once instead of just one. Unfortunately, canonical
methods are very prone to catastrophic forgetting (CF) - the act of overwriting
previous knowledge about a task when learning a new task. Recent efforts have
developed techniques for overcoming CF in learning systems, but no attempt has
been made to apply these new techniques to evolutionary systems. This research
presents a novel technique, weight protection, for reducing CF in evolutionary
systems by adapting a method from learning systems. It is used in conjunction
with other evolutionary approaches for overcoming CF and is shown to be
effective at alleviating CF when applied to a suite of reinforcement learning
tasks. It is speculated that this work could indicate the potential for a wider
application of existing learning-based approaches to evolutionary systems and
that evolutionary techniques may be competitive with or better than learning
systems when it comes to reducing CF.Comment: 14 pages, 5 figure
Single-Net Continual Learning with Progressive Segmented Training (PST)
There is an increasing need of continual learning in dynamic systems, such as
the self-driving vehicle, the surveillance drone, and the robotic system. Such
a system requires learning from the data stream, training the model to preserve
previous information and adapt to a new task, and generating a single-headed
vector for future inference. Different from previous approaches with dynamic
structures, this work focuses on a single network and model segmentation to
prevent catastrophic forgetting. Leveraging the redundant capacity of a single
network, model parameters for each task are separated into two groups: one
important group which is frozen to preserve current knowledge, and secondary
group to be saved (not pruned) for a future learning. A fixed-size memory
containing a small amount of previously seen data is further adopted to assist
the training. Without additional regularization, the simple yet effective
approach of PST successfully incorporates multiple tasks and achieves the
state-of-the-art accuracy in the single-head evaluation on CIFAR-10 and
CIFAR-100 datasets. Moreover, the segmented training significantly improves
computation efficiency in continual learning
A Multi-Task Learning Framework for Overcoming the Catastrophic Forgetting in Automatic Speech Recognition
Recently, data-driven based Automatic Speech Recognition (ASR) systems have
achieved state-of-the-art results. And transfer learning is often used when
those existing systems are adapted to the target domain, e.g., fine-tuning,
retraining. However, in the processes, the system parameters may well deviate
too much from the previously learned parameters. Thus, it is difficult for the
system training process to learn knowledge from target domains meanwhile not
forgetting knowledge from the previous learning process, which is called as
catastrophic forgetting (CF). In this paper, we attempt to solve the CF problem
with the lifelong learning and propose a novel multi-task learning (MTL)
training framework for ASR. It considers reserving original knowledge and
learning new knowledge as two independent tasks, respectively. On the one hand,
we constrain the new parameters not to deviate too far from the original
parameters and punish the new system when forgetting original knowledge. On the
other hand, we force the new system to solve new knowledge quickly. Then, a MTL
mechanism is employed to get the balance between the two tasks. We applied our
method to an End2End ASR task and obtained the best performance in both target
and original datasets.Comment: Submitted to Interspeech 2019
Label Mapping Neural Networks with Response Consolidation for Class Incremental Learning
Class incremental learning refers to a special multi-class classification
task, in which the number of classes is not fixed but is increasing with the
continual arrival of new data. Existing researches mainly focused on solving
catastrophic forgetting problem in class incremental learning. To this end,
however, these models still require the old classes cached in the auxiliary
data structure or models, which is inefficient in space or time. In this paper,
it is the first time to discuss the difficulty without support of old classes
in class incremental learning, which is called as softmax suppression problem.
To address these challenges, we develop a new model named Label Mapping with
Response Consolidation (LMRC), which need not access the old classes anymore.
We propose the Label Mapping algorithm combined with the multi-head neural
network for mitigating the softmax suppression problem, and propose the
Response Consolidation method to overcome the catastrophic forgetting problem.
Experimental results on the benchmark datasets show that our proposed method
achieves much better performance compared to the related methods in different
scenarios
Re-evaluating Continual Learning Scenarios: A Categorization and Case for Strong Baselines
Continual learning has received a great deal of attention recently with
several approaches being proposed. However, evaluations involve a diverse set
of scenarios making meaningful comparison difficult. This work provides a
systematic categorization of the scenarios and evaluates them within a
consistent framework including strong baselines and state-of-the-art methods.
The results provide an understanding of the relative difficulty of the
scenarios and that simple baselines (Adagrad, L2 regularization, and naive
rehearsal strategies) can surprisingly achieve similar performance to current
mainstream methods. We conclude with several suggestions for creating harder
evaluation scenarios and future research directions. The code is available at
https://github.com/GT-RIPL/Continual-Learning-BenchmarkComment: Continual Learning Workshop, 32nd Conference on Neural Information
Processing Systems (NIPS 2018
Don't forget, there is more than forgetting: new metrics for Continual Learning
Continual learning consists of algorithms that learn from a stream of
data/tasks continuously and adaptively thought time, enabling the incremental
development of ever more complex knowledge and skills. The lack of consensus in
evaluating continual learning algorithms and the almost exclusive focus on
forgetting motivate us to propose a more comprehensive set of implementation
independent metrics accounting for several factors we believe have practical
implications worth considering in the deployment of real AI systems that learn
continually: accuracy or performance over time, backward and forward knowledge
transfer, memory overhead as well as computational efficiency. Drawing
inspiration from the standard Multi-Attribute Value Theory (MAVT) we further
propose to fuse these metrics into a single score for ranking purposes and we
evaluate our proposal with five continual learning strategies on the iCIFAR-100
continual learning benchmark
Keep and Learn: Continual Learning by Constraining the Latent Space for Knowledge Preservation in Neural Networks
Data is one of the most important factors in machine learning. However, even
if we have high-quality data, there is a situation in which access to the data
is restricted. For example, access to the medical data from outside is strictly
limited due to the privacy issues. In this case, we have to learn a model
sequentially only with the data accessible in the corresponding stage. In this
work, we propose a new method for preserving learned knowledge by modeling the
high-level feature space and the output space to be mutually informative, and
constraining feature vectors to lie in the modeled space during training. The
proposed method is easy to implement as it can be applied by simply adding a
reconstruction loss to an objective function. We evaluate the proposed method
on CIFAR-10/100 and a chest X-ray dataset, and show benefits in terms of
knowledge preservation compared to previous approaches.Comment: accepted for presentation at MICCAI 201
Attention-Based Structural-Plasticity
Catastrophic forgetting/interference is a critical problem for lifelong
learning machines, which impedes the agents from maintaining their previously
learned knowledge while learning new tasks. Neural networks, in particular,
suffer plenty from the catastrophic forgetting phenomenon. Recently there has
been several efforts towards overcoming catastrophic forgetting in neural
networks. Here, we propose a biologically inspired method toward overcoming
catastrophic forgetting. Specifically, we define an attention-based selective
plasticity of synapses based on the cholinergic neuromodulatory system in the
brain. We define synaptic importance parameters in addition to synaptic weights
and then use Hebbian learning in parallel with backpropagation algorithm to
learn synaptic importances in an online and seamless manner. We test our
proposed method on benchmark tasks including the Permuted MNIST and the Split
MNIST problems and show competitive performance compared to the
state-of-the-art methods
- …