38 research outputs found
Continual learning with direction-constrained optimization
This paper studies a new design of the optimization algorithm for training
deep learning models with a fixed architecture of the classification network in
a continual learning framework, where the training data is non-stationary and
the non-stationarity is imposed by a sequence of distinct tasks. This setting
implies the existence of a manifold of network parameters that correspond to
good performance of the network on all tasks. Our algorithm is derived from the
geometrical properties of this manifold. We first analyze a deep model trained
on only one learning task in isolation and identify a region in network
parameter space, where the model performance is close to the recovered optimum.
We provide empirical evidence that this region resembles a cone that expands
along the convergence direction. We study the principal directions of the
trajectory of the optimizer after convergence and show that traveling along a
few top principal directions can quickly bring the parameters outside the cone
but this is not the case for the remaining directions. We argue that
catastrophic forgetting in a continual learning setting can be alleviated when
the parameters are constrained to stay within the intersection of the plausible
cones of individual tasks that were so far encountered during training.
Enforcing this is equivalent to preventing the parameters from moving along the
top principal directions of convergence corresponding to the past tasks. For
each task we introduce a new linear autoencoder to approximate its
corresponding top forbidden principal directions. They are then incorporated
into the loss function in the form of a regularization term for the purpose of
learning the coming tasks without forgetting. We empirically demonstrate that
our algorithm performs favorably compared to other state-of-art
regularization-based continual learning methods, including EWC and SI
Neural Weight Search for Scalable Task Incremental Learning
Task incremental learning aims to enable a system to maintain its performance
on previously learned tasks while learning new tasks, solving the problem of
catastrophic forgetting. One promising approach is to build an individual
network or sub-network for future tasks. However, this leads to an ever-growing
memory due to saving extra weights for new tasks and how to address this issue
has remained an open problem in task incremental learning. In this paper, we
introduce a novel Neural Weight Search technique that designs a fixed search
space where the optimal combinations of frozen weights can be searched to build
new models for novel tasks in an end-to-end manner, resulting in scalable and
controllable memory growth. Extensive experiments on two benchmarks, i.e.,
Split-CIFAR-100 and CUB-to-Sketches, show our method achieves state-of-the-art
performance with respect to both average inference accuracy and total memory
cost
TAME: Task Agnostic Continual Learning using Multiple Experts
The goal of lifelong learning is to continuously learn from non-stationary
distributions, where the non-stationarity is typically imposed by a sequence of
distinct tasks. Prior works have mostly considered idealistic settings, where
the identity of tasks is known at least at training. In this paper we focus on
a fundamentally harder, so-called task-agnostic setting where the task
identities are not known and the learning machine needs to infer them from the
observations. Our algorithm, which we call TAME (Task-Agnostic continual
learning using Multiple Experts), automatically detects the shift in data
distributions and switches between task expert networks in an online manner. At
training, the strategy for switching between tasks hinges on an extremely
simple observation that for each new coming task there occurs a
statistically-significant deviation in the value of the loss function that
marks the onset of this new task. At inference, the switching between experts
is governed by the selector network that forwards the test sample to its
relevant expert network. The selector network is trained on a small subset of
data drawn uniformly at random. We control the growth of the task expert
networks as well as selector network by employing online pruning. Our
experimental results show the efficacy of our approach on benchmark continual
learning data sets, outperforming the previous task-agnostic methods and even
the techniques that admit task identities at both training and testing, while
at the same time using a comparable model size