Search CORE

626 research outputs found

Continual Learning with Gated Incremental Memories for sequential data processing

Author: Bacciu Davide
Carta Antonio
Cossu Andrea
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/04/2020
Field of study

The ability to learn in dynamic, nonstationary environments without forgetting previous knowledge, also known as Continual Learning (CL), is a key enabler for scalable and trustworthy deployments of adaptive solutions. While the importance of continual learning is largely acknowledged in machine vision and reinforcement learning problems, this is mostly under-documented for sequence processing tasks. This work proposes a Recurrent Neural Network (RNN) model for CL that is able to deal with concept drift in input distribution without forgetting previously acquired knowledge. We also implement and test a popular CL approach, Elastic Weight Consolidation (EWC), on top of two different types of RNNs. Finally, we compare the performances of our enhanced architecture against EWC and RNNs on a set of standard CL benchmarks, adapted to the sequential data processing scenario. Results show the superior performance of our architecture and highlight the need for special solutions designed to address CL in RNNs.Comment: Accepted as a conference paper at 2020 International Joint Conference on Neural Networks (IJCNN 2020). Part of 2020 IEEE World Congress on Computational Intelligence (IEEE WCCI 2020

arXiv.org e-Print Archive

Crossref

Conditional Channel Gated Networks for Task-Aware Continual Learning

Author: Abati Davide
Bejnordi Babak Ehteshami
Blankevoort Tijmen
Calderara Simone
Cucchiara Rita
Tomczak Jakub
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

Convolutional Neural Networks experience catastrophic forgetting when optimized on a sequence of learning problems: as they meet the objective of the current training examples, their performance on previous tasks drops drastically. In this work, we introduce a novel framework to tackle this problem with conditional computation. We equip each convolutional layer with task-specific gating modules, selecting which filters to apply on the given input. This way, we achieve two appealing properties. Firstly, the execution patterns of the gates allow to identify and protect important filters, ensuring no loss in the performance of the model for previously learned tasks. Secondly, by using a sparsity objective, we can promote the selection of a limited set of kernels, allowing to retain sufficient model capacity to digest new tasks.Existing solutions require, at test time, awareness of the task to which each example belongs to. This knowledge, however, may not be available in many practical scenarios. Therefore, we additionally introduce a task classifier that predicts the task label of each example, to deal with settings in which a task oracle is not available. We validate our proposal on four continual learning datasets. Results show that our model consistently outperforms existing methods both in the presence and the absence of a task oracle. Notably, on Split SVHN and Imagenet-50 datasets, our model yields up to 23.98% and 17.42% improvement in accuracy w.r.t. competing methods.Comment: CVPR 2020 (oral

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Towards Real-World Data Streams for Deep Continual Learning

Author: COSSU Andrea
Publication venue: 'Scuola Normale Superiore - Edizioni della Normale'
Publication date: 01/01/2023
Field of study

Continual Learning deals with Artificial Intelligent agents striving to learn from an ever-ending stream of data. Recently, Deep Continual Learning focused on the design of new strategies to endow Artificial Neural Networks with the ability to learn continuously without forgetting previous knowledge. In fact, the learning process of any Artificial Neural Network model is well-known to lack the sufficient stability to preserve existing knowledge when learning new information. This phenomenon, called catastrophic forgetting or simply forgetting, is considered one of the main obstacles for the design of effective Continual Learning agents. However, existing strategies designed to mitigate forgetting have been evaluated on a restricted set of Continual Learning scenarios. The most used one is, by far, the Class-Incremental scenario applied on object detection tasks. Even though it drove interest in Continual Learning, Class-Incremental scenarios strongly constraint the properties of the data stream, thus limiting its ability to model real-world environments. The core of this thesis concerns the introduction of three Continual Learning data streams, whose design is centered around specific real-world environments properties. First, we propose the Class- Incremental with Repetition scenario, which builds a data stream including both the introduction of new concepts and the repetition of previous ones. Repetition is naturally present in many environments and it constitutes an important source of information. Second, we formalize the Continual Pre-Training scenario, which leverages a data stream of unstructured knowledge to keep a pre-trained model updated over time. One important objective of this scenario is to study how to continuously build general, robust representations that does not strongly depend on the specific task to be solved. This is a fundamental property of real-world agents, which build cross-task knowledge and then adapts it to specific needs. Third, we study Continual Learning scenarios where data streams are composed by temporally-correlated data. Temporal correlation is ubiquitous and lies at the foundation of most environments we, as humans, experience during our life. We leverage Recurrent Neural Networks as our main model, due to their intrinsic ability to model temporal correlations. We discovered that, when applied to recurrent models, Continual Learning strategies behave in an unexpected manner. This highlights the limits of the current experimental validation, mostly focused on Computer Vision tasks. Ultimately, the introduction of new data streams contributed to deepen our understanding of how Artificial Neural Networks learn continuously. We discover that forgetting strongly depends on the properties of the data stream and we observed large changes from one data stream to another. Moreover, when forgetting is mild, we were able to effectively mitigate it with simple strategies, or even without any specific ones. Loosening the focus on forgetting allows us to turn our attention to other interesting problems, outlined in this thesis, like (i) separation between continual representation learning and quick adaptation to novel tasks, (ii) robustness to unbalanced data streams and (iii) ability to continuously learn temporal correlations. These objectives currently defy existing strategies and will likely represent the next challenge for Continual Learning research

Archivio istituzionale della Ricerca - Scuola Normale Superiore

Archivio della Ricerca - Università di Pisa

LIRA: Lifelong Image Restoration from Unknown Blended Distortions

Author: A Robins
C Schmidt-Hieber
J Kirkpatrick
J Schlemper
JB Aimone
JL McClelland
K Zhang
L Zhang
M Mermillod
N Kee
P Arbelaez
R Zeyde
WC Abraham
Y Chen
Y Matsui
Z Li
Publication venue
Publication date: 18/08/2020
Field of study

Most existing image restoration networks are designed in a disposable way and catastrophically forget previously learned distortions when trained on a new distortion removal task. To alleviate this problem, we raise the novel lifelong image restoration problem for blended distortions. We first design a base fork-join model in which multiple pre-trained expert models specializing in individual distortion removal task work cooperatively and adaptively to handle blended distortions. When the input is degraded by a new distortion, inspired by adult neurogenesis in human memory system, we develop a neural growing strategy where the previously trained model can incorporate a new expert branch and continually accumulate new knowledge without interfering with learned knowledge. Experimental results show that the proposed approach can not only achieve state-of-the-art performance on blended distortions removal tasks in both PSNR/SSIM metrics, but also maintain old expertise while learning new restoration tasks.Comment: ECCV2020 accepte

arXiv.org e-Print Archive

Crossref

Online Research @ Cardiff

Dynamic Mathematics for Automated Machine Learning Techniques

Author: Kuo Nicholas
Publication venue
Publication date: 01/01/2021
Field of study

Machine Learning and Neural Networks have been gaining popularity and are widely considered as the driving force of the Fourth Industrial Revolution. However, modern machine learning techniques such as backpropagation training was firmly established in 1986 while computer vision was revolutionised in 2012 with the introduction of AlexNet. Given all these accomplishments, why are neural networks still not an integral part of our society? ``Because they are difficult to implement in practice.'' I'd like to use machine learning, but I can't invest much time. The concept of Automated Machine Learning (AutoML) was first proposed by Professor Frank Hutter of the University of Freiburg. Machine learning is not simple; it requires a practitioner to have thorough understanding on the attributes of their data and the components which their model entails. AutoML is the effort to automate all tedious aspects of machine learning to form a clean data analysis pipeline. This thesis is our effort to develop and to understand ways to automate machine learning. Specifically, we focused on Recurrent Neural Networks (RNNs), Meta-Learning, and Continual Learning. We studied continual learning to enable a network to sequentially acquire skills in a dynamic environment; we studied meta-learning to understand how a network can be configured efficiently; and we studied RNNs to understand the consequences of consecutive actions. Our RNN-study focused on mathematical interpretability. We described a large variety of RNNs as one mathematical class to understand their core network mechanism. This enabled us to extend meta-learning beyond network configuration for network pruning and continual learning. This also provided insights for us to understand how a single network should be consecutively configured and led us to the creation of a simple generic patch that is compatible to several existing continual learning archetypes. This patch enhanced the robustness of continual learning techniques and allowed them to generalise data better. By and large, this thesis presented a series of extensions to enable AutoML to be made simple, efficient, and robust. More importantly, all of our methods are motivated with mathematical understandings through the lens of dynamical systems. Thus, we also increased the interpretability of AutoML concepts

The Australian National University