Search CORE

7 research outputs found

Overcoming Catastrophic Interference by Conceptors

Author: He Xu
Jaeger Herbert
Publication venue
Publication date: 20/07/2017
Field of study

Catastrophic interference has been a major roadblock in the research of continual learning. Here we propose a variant of the back-propagation algorithm, "conceptor-aided back-prop" (CAB), in which gradients are shielded by conceptors against degradation of previously learned tasks. Conceptors have their origin in reservoir computing, where they have been previously shown to overcome catastrophic forgetting. CAB extends these results to deep feedforward networks. On the disjoint MNIST task CAB outperforms two other methods for coping with catastrophic interference that have recently been proposed in the deep learning field.Comment: 14 pages, 4 figure

arXiv.org e-Print Archive

Progress & Compress: A scalable framework for continual learning

Author: Czarnecki Wojciech M.
Grabska-Barwinska Agnieszka
Hadsell Raia
Luketina Jelena
Pascanu Razvan
Schwarz Jonathan
Teh Yee Whye
Publication venue
Publication date: 02/07/2018
Field of study

We introduce a conceptually simple and scalable framework for continual learning domains where tasks are learned sequentially. Our method is constant in the number of parameters and is designed to preserve performance on previously encountered tasks while accelerating learning progress on subsequent problems. This is achieved by training a network with two components: A knowledge base, capable of solving previously encountered problems, which is connected to an active column that is employed to efficiently learn the current task. After learning a new task, the active column is distilled into the knowledge base, taking care to protect any previously acquired skills. This cycle of active learning (progression) followed by consolidation (compression) requires no architecture growth, no access to or storing of previous data or tasks, and no task-specific parameters. We demonstrate the progress & compress approach on sequential classification of handwritten alphabets as well as two reinforcement learning domains: Atari games and 3D maze navigation.Comment: Accepted at ICML 201

arXiv.org e-Print Archive

Disturbance-immune Weight Sharing for Neural Architecture Search

Author: Guo Yong
Huang Junzhou
Niu Shuaicheng
Tan Mingkui
Wu Jiaxiang
Zhang Yifan
Zhao Peilin
Publication venue
Publication date: 29/03/2020
Field of study

Neural architecture search (NAS) has gained increasing attention in the community of architecture design. One of the key factors behind the success lies in the training efficiency created by the weight sharing (WS) technique. However, WS-based NAS methods often suffer from a performance disturbance (PD) issue. That is, the training of subsequent architectures inevitably disturbs the performance of previously trained architectures due to the partially shared weights. This leads to inaccurate performance estimation for the previous architectures, which makes it hard to learn a good search strategy. To alleviate the performance disturbance issue, we propose a new disturbance-immune update strategy for model updating. Specifically, to preserve the knowledge learned by previous architectures, we constrain the training of subsequent architectures in an orthogonal space via orthogonal gradient descent. Equipped with this strategy, we propose a novel disturbance-immune training scheme for NAS. We theoretically analyze the effectiveness of our strategy in alleviating the PD risk. Extensive experiments on CIFAR-10 and ImageNet verify the superiority of our method

arXiv.org e-Print Archive

Continual learning with hypernetworks

Author: Grewe Benjamin F.
Henning Christian
Sacramento João
von Oswald Johannes
Publication venue
Publication date: 12/02/2020
Field of study

Artificial neural networks suffer from catastrophic forgetting when they are sequentially trained on multiple tasks. To overcome this problem, we present a novel approach based on task-conditioned hypernetworks, i.e., networks that generate the weights of a target model based on task identity. Continual learning (CL) is less difficult for this class of models thanks to a simple key feature: instead of recalling the input-output relations of all previously seen data, task-conditioned hypernetworks only require rehearsing task-specific weight realizations, which can be maintained in memory using a simple regularizer. Besides achieving state-of-the-art performance on standard CL benchmarks, additional experiments on long task sequences reveal that task-conditioned hypernetworks display a very large capacity to retain previous memories. Notably, such long memory lifetimes are achieved in a compressive regime, when the number of trainable hypernetwork weights is comparable or smaller than target network size. We provide insight into the structure of low-dimensional task embedding spaces (the input space of the hypernetwork) and show that task-conditioned hypernetworks demonstrate transfer learning. Finally, forward information transfer is further supported by empirical results on a challenging CL benchmark based on the CIFAR-10/100 image datasets.Comment: Published at ICLR 202

arXiv.org e-Print Archive

Overcoming Multi-Model Forgetting

Author: Bennani-Smires Kamil
Benyahia Yassine
Davison Anthony
Jaggi Martin
Musat Claudiu
Salzmann Mathieu
Yu Kaicheng
Publication venue
Publication date: 02/03/2019
Field of study

We identify a phenomenon, which we refer to as multi-model forgetting, that occurs when sequentially training multiple deep networks with partially-shared parameters; the performance of previously-trained models degrades as one optimizes a subsequent one, due to the overwriting of shared parameters. To overcome this, we introduce a statistically-justified weight plasticity loss that regularizes the learning of a model's shared parameters according to their importance for the previous models, and demonstrate its effectiveness when training two models sequentially and for neural architecture search. Adding weight plasticity in neural architecture search preserves the best models to the end of the search and yields improved results in both natural language processing and computer vision tasks

arXiv.org e-Print Archive

Alleviating catastrophic forgetting using context-dependent gating and synaptic stabilization

Author: Freedman David J.
Grant Gregory D.
Masse Nicolas Y.
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 03/04/2019
Field of study

Humans and most animals can learn new tasks without forgetting old ones. However, training artificial neural networks (ANNs) on new tasks typically cause it to forget previously learned tasks. This phenomenon is the result of "catastrophic forgetting", in which training an ANN disrupts connection weights that were important for solving previous tasks, degrading task performance. Several recent studies have proposed methods to stabilize connection weights of ANNs that are deemed most important for solving a task, which helps alleviate catastrophic forgetting. Here, drawing inspiration from algorithms that are believed to be implemented in vivo, we propose a complementary method: adding a context-dependent gating signal, such that only sparse, mostly non-overlapping patterns of units are active for any one task. This method is easy to implement, requires little computational overhead, and allows ANNs to maintain high performance across large numbers of sequentially presented tasks when combined with weight stabilization. This work provides another example of how neuroscience-inspired algorithms can benefit ANN design and capability.Comment: Published in PNAS, https://www.pnas.org/content/115/44/E1046

arXiv.org e-Print Archive

Physical reservoir computing -- An introductory perspective

Author: Nakajima Kohei
Publication venue: 'IOP Publishing'
Publication date: 03/05/2020
Field of study

Understanding the fundamental relationships between physics and its information-processing capability has been an active research topic for many years. Physical reservoir computing is a recently introduced framework that allows one to exploit the complex dynamics of physical systems as information-processing devices. This framework is particularly suited for edge computing devices, in which information processing is incorporated at the edge (e.g., into sensors) in a decentralized manner to reduce the adaptation delay caused by data transmission overhead. This paper aims to illustrate the potentials of the framework using examples from soft robotics and to provide a concise overview focusing on the basic motivations for introducing it, which stem from a number of fields, including machine learning, nonlinear dynamical systems, biological science, materials science, and physics.Comment: 18 pages, 8 figure

arXiv.org e-Print Archive