41 research outputs found
Continual learning with direction-constrained optimization
This paper studies a new design of the optimization algorithm for training
deep learning models with a fixed architecture of the classification network in
a continual learning framework, where the training data is non-stationary and
the non-stationarity is imposed by a sequence of distinct tasks. This setting
implies the existence of a manifold of network parameters that correspond to
good performance of the network on all tasks. Our algorithm is derived from the
geometrical properties of this manifold. We first analyze a deep model trained
on only one learning task in isolation and identify a region in network
parameter space, where the model performance is close to the recovered optimum.
We provide empirical evidence that this region resembles a cone that expands
along the convergence direction. We study the principal directions of the
trajectory of the optimizer after convergence and show that traveling along a
few top principal directions can quickly bring the parameters outside the cone
but this is not the case for the remaining directions. We argue that
catastrophic forgetting in a continual learning setting can be alleviated when
the parameters are constrained to stay within the intersection of the plausible
cones of individual tasks that were so far encountered during training.
Enforcing this is equivalent to preventing the parameters from moving along the
top principal directions of convergence corresponding to the past tasks. For
each task we introduce a new linear autoencoder to approximate its
corresponding top forbidden principal directions. They are then incorporated
into the loss function in the form of a regularization term for the purpose of
learning the coming tasks without forgetting. We empirically demonstrate that
our algorithm performs favorably compared to other state-of-art
regularization-based continual learning methods, including EWC and SI
Online Lifelong Generalized Zero-Shot Learning
Methods proposed in the literature for zero-shot learning (ZSL) are typically
suitable for offline learning and cannot continually learn from sequential
streaming data. The sequential data comes in the form of tasks during training.
Recently, a few attempts have been made to handle this issue and develop
continual ZSL (CZSL) methods. However, these CZSL methods require clear
task-boundary information between the tasks during training, which is not
practically possible. This paper proposes a task-free (i.e., task-agnostic)
CZSL method, which does not require any task information during continual
learning. The proposed task-free CZSL method employs a variational autoencoder
(VAE) for performing ZSL. To develop the CZSL method, we combine the concept of
experience replay with knowledge distillation and regularization. Here,
knowledge distillation is performed using the training sample's dark knowledge,
which essentially helps overcome the catastrophic forgetting issue. Further, it
is enabled for task-free learning using short-term memory. Finally, a
classifier is trained on the synthetic features generated at the latent space
of the VAE. Moreover, the experiments are conducted in a challenging and
practical ZSL setup, i.e., generalized ZSL (GZSL). These experiments are
conducted for two kinds of single-head continual learning settings: (i) mild
setting-: task-boundary is known only during training but not during testing;
(ii) strict setting-: task-boundary is not known at training, as well as
testing. Experimental results on five benchmark datasets exhibit the validity
of the approach for CZSL