Search CORE

29,048 research outputs found

Overcoming Multi-Model Forgetting

Author: Bennani-Smires Kamil
Benyahia Yassine
Davison Anthony
Jaggi Martin
Musat Claudiu
Salzmann Mathieu
Yu Kaicheng
Publication venue
Publication date: 02/03/2019
Field of study

We identify a phenomenon, which we refer to as multi-model forgetting, that occurs when sequentially training multiple deep networks with partially-shared parameters; the performance of previously-trained models degrades as one optimizes a subsequent one, due to the overwriting of shared parameters. To overcome this, we introduce a statistically-justified weight plasticity loss that regularizes the learning of a model's shared parameters according to their importance for the previous models, and demonstrate its effectiveness when training two models sequentially and for neural architecture search. Adding weight plasticity in neural architecture search preserves the best models to the end of the search and yields improved results in both natural language processing and computer vision tasks

arXiv.org e-Print Archive

Learn to Grow: A Continual Structure Learning Framework for Overcoming Catastrophic Forgetting

Author: Li Xilai
Socher Richard
Wu Tianfu
Xiong Caiming
Zhou Yingbo
Publication venue
Publication date: 21/05/2019
Field of study

Addressing catastrophic forgetting is one of the key challenges in continual learning where machine learning systems are trained with sequential or streaming tasks. Despite recent remarkable progress in state-of-the-art deep learning, deep neural networks (DNNs) are still plagued with the catastrophic forgetting problem. This paper presents a conceptually simple yet general and effective framework for handling catastrophic forgetting in continual learning with DNNs. The proposed method consists of two components: a neural structure optimization component and a parameter learning and/or fine-tuning component. By separating the explicit neural structure learning and the parameter estimation, not only is the proposed method capable of evolving neural structures in an intuitively meaningful way, but also shows strong capabilities of alleviating catastrophic forgetting in experiments. Furthermore, the proposed method outperforms all other baselines on the permuted MNIST dataset, the split CIFAR100 dataset and the Visual Domain Decathlon dataset in continual learning setting

arXiv.org e-Print Archive

Reducing catastrophic forgetting when evolving neural networks

Author: Early Joseph
Publication venue
Publication date: 05/04/2019
Field of study

A key stepping stone in the development of an artificial general intelligence (a machine that can perform any task), is the production of agents that can perform multiple tasks at once instead of just one. Unfortunately, canonical methods are very prone to catastrophic forgetting (CF) - the act of overwriting previous knowledge about a task when learning a new task. Recent efforts have developed techniques for overcoming CF in learning systems, but no attempt has been made to apply these new techniques to evolutionary systems. This research presents a novel technique, weight protection, for reducing CF in evolutionary systems by adapting a method from learning systems. It is used in conjunction with other evolutionary approaches for overcoming CF and is shown to be effective at alleviating CF when applied to a suite of reinforcement learning tasks. It is speculated that this work could indicate the potential for a wider application of existing learning-based approaches to evolutionary systems and that evolutionary techniques may be competitive with or better than learning systems when it comes to reducing CF.Comment: 14 pages, 5 figure

arXiv.org e-Print Archive

Single-Net Continual Learning with Progressive Segmented Training (PST)

Author: Cao Yu
Charan Gouranga
Du Xiaocong
Liu Frank
Publication venue
Publication date: 19/12/2019
Field of study

There is an increasing need of continual learning in dynamic systems, such as the self-driving vehicle, the surveillance drone, and the robotic system. Such a system requires learning from the data stream, training the model to preserve previous information and adapt to a new task, and generating a single-headed vector for future inference. Different from previous approaches with dynamic structures, this work focuses on a single network and model segmentation to prevent catastrophic forgetting. Leveraging the redundant capacity of a single network, model parameters for each task are separated into two groups: one important group which is frozen to preserve current knowledge, and secondary group to be saved (not pruned) for a future learning. A fixed-size memory containing a small amount of previously seen data is further adopted to assist the training. Without additional regularization, the simple yet effective approach of PST successfully incorporates multiple tasks and achieves the state-of-the-art accuracy in the single-head evaluation on CIFAR-10 and CIFAR-100 datasets. Moreover, the segmented training significantly improves computation efficiency in continual learning

arXiv.org e-Print Archive

A Multi-Task Learning Framework for Overcoming the Catastrophic Forgetting in Automatic Speech Recognition

Author: Gao Xiang
Guo Jiaxing
Han Jiqing
Xue Jiabin
Zheng Tieran
Publication venue
Publication date: 16/04/2019
Field of study

Recently, data-driven based Automatic Speech Recognition (ASR) systems have achieved state-of-the-art results. And transfer learning is often used when those existing systems are adapted to the target domain, e.g., fine-tuning, retraining. However, in the processes, the system parameters may well deviate too much from the previously learned parameters. Thus, it is difficult for the system training process to learn knowledge from target domains meanwhile not forgetting knowledge from the previous learning process, which is called as catastrophic forgetting (CF). In this paper, we attempt to solve the CF problem with the lifelong learning and propose a novel multi-task learning (MTL) training framework for ASR. It considers reserving original knowledge and learning new knowledge as two independent tasks, respectively. On the one hand, we constrain the new parameters not to deviate too far from the original parameters and punish the new system when forgetting original knowledge. On the other hand, we force the new system to solve new knowledge quickly. Then, a MTL mechanism is employed to get the balance between the two tasks. We applied our method to an End2End ASR task and obtained the best performance in both target and original datasets.Comment: Submitted to Interspeech 2019

arXiv.org e-Print Archive

Label Mapping Neural Networks with Response Consolidation for Class Incremental Learning

Author: Lin Qingwei
Mao Lekun
Shen Furao
Xu Baile
Yao Yang
Zhang Xu
Zhao Jian
Publication venue
Publication date: 19/05/2019
Field of study

Class incremental learning refers to a special multi-class classification task, in which the number of classes is not fixed but is increasing with the continual arrival of new data. Existing researches mainly focused on solving catastrophic forgetting problem in class incremental learning. To this end, however, these models still require the old classes cached in the auxiliary data structure or models, which is inefficient in space or time. In this paper, it is the first time to discuss the difficulty without support of old classes in class incremental learning, which is called as softmax suppression problem. To address these challenges, we develop a new model named Label Mapping with Response Consolidation (LMRC), which need not access the old classes anymore. We propose the Label Mapping algorithm combined with the multi-head neural network for mitigating the softmax suppression problem, and propose the Response Consolidation method to overcome the catastrophic forgetting problem. Experimental results on the benchmark datasets show that our proposed method achieves much better performance compared to the related methods in different scenarios

arXiv.org e-Print Archive

Re-evaluating Continual Learning Scenarios: A Categorization and Case for Strong Baselines

Author: Hsu Yen-Chang
Kira Zsolt
Liu Yen-Cheng
Ramasamy Anita
Publication venue
Publication date: 23/01/2019
Field of study

Continual learning has received a great deal of attention recently with several approaches being proposed. However, evaluations involve a diverse set of scenarios making meaningful comparison difficult. This work provides a systematic categorization of the scenarios and evaluates them within a consistent framework including strong baselines and state-of-the-art methods. The results provide an understanding of the relative difficulty of the scenarios and that simple baselines (Adagrad, L2 regularization, and naive rehearsal strategies) can surprisingly achieve similar performance to current mainstream methods. We conclude with several suggestions for creating harder evaluation scenarios and future research directions. The code is available at https://github.com/GT-RIPL/Continual-Learning-BenchmarkComment: Continual Learning Workshop, 32nd Conference on Neural Information Processing Systems (NIPS 2018

arXiv.org e-Print Archive

Don't forget, there is more than forgetting: new metrics for Continual Learning

Author: Díaz-Rodríguez Natalia
Filliat David
Lomonaco Vincenzo
Maltoni Davide
Publication venue
Publication date: 31/10/2018
Field of study

Continual learning consists of algorithms that learn from a stream of data/tasks continuously and adaptively thought time, enabling the incremental development of ever more complex knowledge and skills. The lack of consensus in evaluating continual learning algorithms and the almost exclusive focus on forgetting motivate us to propose a more comprehensive set of implementation independent metrics accounting for several factors we believe have practical implications worth considering in the deployment of real AI systems that learn continually: accuracy or performance over time, backward and forward knowledge transfer, memory overhead as well as computational efficiency. Drawing inspiration from the standard Multi-Attribute Value Theory (MAVT) we further propose to fuse these metrics into a single score for ranking purposes and we evaluate our proposal with five continual learning strategies on the iCIFAR-100 continual learning benchmark

arXiv.org e-Print Archive

Keep and Learn: Continual Learning by Constraining the Latent Space for Knowledge Preservation in Neural Networks

Author: Kim Hyo-Eun
Kim Seungwook
Lee Jaehwan
Publication venue
Publication date: 28/05/2018
Field of study

Data is one of the most important factors in machine learning. However, even if we have high-quality data, there is a situation in which access to the data is restricted. For example, access to the medical data from outside is strictly limited due to the privacy issues. In this case, we have to learn a model sequentially only with the data accessible in the corresponding stage. In this work, we propose a new method for preserving learned knowledge by modeling the high-level feature space and the output space to be mutually informative, and constraining feature vectors to lie in the modeled space during training. The proposed method is easy to implement as it can be applied by simply adding a reconstruction loss to an objective function. We evaluate the proposed method on CIFAR-10/100 and a chest X-ray dataset, and show benefits in terms of knowledge preservation compared to previous approaches.Comment: accepted for presentation at MICCAI 201

arXiv.org e-Print Archive

Attention-Based Structural-Plasticity

Author: Ketz Nicholas
Kolouri Soheil
Krichmar Jeffrey
Pilly Praveen
Zou Xinyun
Publication venue
Publication date: 02/03/2019
Field of study

Catastrophic forgetting/interference is a critical problem for lifelong learning machines, which impedes the agents from maintaining their previously learned knowledge while learning new tasks. Neural networks, in particular, suffer plenty from the catastrophic forgetting phenomenon. Recently there has been several efforts towards overcoming catastrophic forgetting in neural networks. Here, we propose a biologically inspired method toward overcoming catastrophic forgetting. Specifically, we define an attention-based selective plasticity of synapses based on the cholinergic neuromodulatory system in the brain. We define synaptic importance parameters in addition to synaptic weights and then use Hebbian learning in parallel with backpropagation algorithm to learn synaptic importances in an online and seamless manner. We test our proposed method on benchmark tasks including the Permuted MNIST and the Split MNIST problems and show competitive performance compared to the state-of-the-art methods

arXiv.org e-Print Archive