626 research outputs found
Continual Learning with Gated Incremental Memories for sequential data processing
The ability to learn in dynamic, nonstationary environments without
forgetting previous knowledge, also known as Continual Learning (CL), is a key
enabler for scalable and trustworthy deployments of adaptive solutions. While
the importance of continual learning is largely acknowledged in machine vision
and reinforcement learning problems, this is mostly under-documented for
sequence processing tasks. This work proposes a Recurrent Neural Network (RNN)
model for CL that is able to deal with concept drift in input distribution
without forgetting previously acquired knowledge. We also implement and test a
popular CL approach, Elastic Weight Consolidation (EWC), on top of two
different types of RNNs. Finally, we compare the performances of our enhanced
architecture against EWC and RNNs on a set of standard CL benchmarks, adapted
to the sequential data processing scenario. Results show the superior
performance of our architecture and highlight the need for special solutions
designed to address CL in RNNs.Comment: Accepted as a conference paper at 2020 International Joint Conference
on Neural Networks (IJCNN 2020). Part of 2020 IEEE World Congress on
Computational Intelligence (IEEE WCCI 2020
Conditional Channel Gated Networks for Task-Aware Continual Learning
Convolutional Neural Networks experience catastrophic forgetting when
optimized on a sequence of learning problems: as they meet the objective of the
current training examples, their performance on previous tasks drops
drastically. In this work, we introduce a novel framework to tackle this
problem with conditional computation. We equip each convolutional layer with
task-specific gating modules, selecting which filters to apply on the given
input. This way, we achieve two appealing properties. Firstly, the execution
patterns of the gates allow to identify and protect important filters, ensuring
no loss in the performance of the model for previously learned tasks. Secondly,
by using a sparsity objective, we can promote the selection of a limited set of
kernels, allowing to retain sufficient model capacity to digest new
tasks.Existing solutions require, at test time, awareness of the task to which
each example belongs to. This knowledge, however, may not be available in many
practical scenarios. Therefore, we additionally introduce a task classifier
that predicts the task label of each example, to deal with settings in which a
task oracle is not available. We validate our proposal on four continual
learning datasets. Results show that our model consistently outperforms
existing methods both in the presence and the absence of a task oracle.
Notably, on Split SVHN and Imagenet-50 datasets, our model yields up to 23.98%
and 17.42% improvement in accuracy w.r.t. competing methods.Comment: CVPR 2020 (oral
Towards Real-World Data Streams for Deep Continual Learning
Continual Learning deals with Artificial Intelligent agents striving to learn from an ever-ending
stream of data. Recently, Deep Continual Learning focused on the design of new strategies to
endow Artificial Neural Networks with the ability to learn continuously without forgetting previous
knowledge. In fact, the learning process of any Artificial Neural Network model is well-known to
lack the sufficient stability to preserve existing knowledge when learning new information. This
phenomenon, called catastrophic forgetting or simply forgetting, is considered one of the main
obstacles for the design of effective Continual Learning agents. However, existing strategies designed
to mitigate forgetting have been evaluated on a restricted set of Continual Learning scenarios. The
most used one is, by far, the Class-Incremental scenario applied on object detection tasks. Even
though it drove interest in Continual Learning, Class-Incremental scenarios strongly constraint the
properties of the data stream, thus limiting its ability to model real-world environments.
The core of this thesis concerns the introduction of three Continual Learning data streams, whose
design is centered around specific real-world environments properties. First, we propose the Class-
Incremental with Repetition scenario, which builds a data stream including both the introduction
of new concepts and the repetition of previous ones. Repetition is naturally present in many
environments and it constitutes an important source of information. Second, we formalize the
Continual Pre-Training scenario, which leverages a data stream of unstructured knowledge to keep
a pre-trained model updated over time. One important objective of this scenario is to study how to
continuously build general, robust representations that does not strongly depend on the specific task
to be solved. This is a fundamental property of real-world agents, which build cross-task knowledge
and then adapts it to specific needs. Third, we study Continual Learning scenarios where data
streams are composed by temporally-correlated data. Temporal correlation is ubiquitous and lies
at the foundation of most environments we, as humans, experience during our life. We leverage
Recurrent Neural Networks as our main model, due to their intrinsic ability to model temporal
correlations. We discovered that, when applied to recurrent models, Continual Learning strategies
behave in an unexpected manner. This highlights the limits of the current experimental validation,
mostly focused on Computer Vision tasks.
Ultimately, the introduction of new data streams contributed to deepen our understanding of
how Artificial Neural Networks learn continuously. We discover that forgetting strongly depends
on the properties of the data stream and we observed large changes from one data stream to
another. Moreover, when forgetting is mild, we were able to effectively mitigate it with simple
strategies, or even without any specific ones. Loosening the focus on forgetting allows us to turn our
attention to other interesting problems, outlined in this thesis, like (i) separation between continual
representation learning and quick adaptation to novel tasks, (ii) robustness to unbalanced data
streams and (iii) ability to continuously learn temporal correlations. These objectives currently
defy existing strategies and will likely represent the next challenge for Continual Learning research
LIRA: Lifelong Image Restoration from Unknown Blended Distortions
Most existing image restoration networks are designed in a disposable way and
catastrophically forget previously learned distortions when trained on a new
distortion removal task. To alleviate this problem, we raise the novel lifelong
image restoration problem for blended distortions. We first design a base
fork-join model in which multiple pre-trained expert models specializing in
individual distortion removal task work cooperatively and adaptively to handle
blended distortions. When the input is degraded by a new distortion, inspired
by adult neurogenesis in human memory system, we develop a neural growing
strategy where the previously trained model can incorporate a new expert branch
and continually accumulate new knowledge without interfering with learned
knowledge. Experimental results show that the proposed approach can not only
achieve state-of-the-art performance on blended distortions removal tasks in
both PSNR/SSIM metrics, but also maintain old expertise while learning new
restoration tasks.Comment: ECCV2020 accepte
Dynamic Mathematics for Automated Machine Learning Techniques
Machine Learning and Neural Networks have been gaining popularity and are widely considered as the driving force of the Fourth Industrial Revolution. However, modern machine learning techniques such as backpropagation training was firmly established in 1986 while computer vision was revolutionised in 2012 with the introduction of AlexNet. Given all these accomplishments, why are neural networks still not an integral part of our society? ``Because they are difficult to implement in practice.'' I'd like to use machine learning, but I can't invest much time. The concept of Automated Machine Learning (AutoML) was first proposed by Professor Frank Hutter of the University of Freiburg. Machine learning is not simple; it requires a practitioner to have thorough understanding on the attributes of their data and the components which their model entails. AutoML is the effort to automate all tedious aspects of machine learning to form a clean data analysis pipeline. This thesis is our effort to develop and to understand ways to automate machine learning. Specifically, we focused on Recurrent Neural Networks (RNNs), Meta-Learning, and Continual Learning. We studied continual learning to enable a network to sequentially acquire skills in a dynamic environment; we studied meta-learning to understand how a network can be configured efficiently; and we studied RNNs to understand the consequences of consecutive actions. Our RNN-study focused on mathematical interpretability. We described a large variety of RNNs as one mathematical class to understand their core network mechanism. This enabled us to extend meta-learning beyond network configuration for network pruning and continual learning. This also provided insights for us to understand how a single network should be consecutively configured and led us to the creation of a simple generic patch that is compatible to several existing continual learning archetypes. This patch enhanced the robustness of continual learning techniques and allowed them to generalise data better. By and large, this thesis presented a series of extensions to enable AutoML to be made simple, efficient, and robust. More importantly, all of our methods are motivated with mathematical understandings through the lens of dynamical systems. Thus, we also increased the interpretability of AutoML concepts
- …