165,252 research outputs found
Fed-CPrompt: Contrastive Prompt for Rehearsal-Free Federated Continual Learning
Federated continual learning (FCL) learns incremental tasks over time from
confidential datasets distributed across clients. This paper focuses on
rehearsal-free FCL, which has severe forgetting issues when learning new tasks
due to the lack of access to historical task data. To address this issue, we
propose Fed-CPrompt based on prompt learning techniques to obtain task-specific
prompts in a communication-efficient way. Fed-CPrompt introduces two key
components, asynchronous prompt learning, and contrastive continual loss, to
handle asynchronous task arrival and heterogeneous data distributions in FCL,
respectively. Extensive experiments demonstrate the effectiveness of
Fed-CPrompt in achieving SOTA rehearsal-free FCL performance.Comment: Accepted by FL-ICML 202
Learning an evolved mixture model for task-free continual learning
Recently, continual learning (CL) has gained significant interest because it
enables deep learning models to acquire new knowledge without forgetting
previously learnt information. However, most existing works require knowing the
task identities and boundaries, which is not realistic in a real context. In
this paper, we address a more challenging and realistic setting in CL, namely
the Task-Free Continual Learning (TFCL) in which a model is trained on
non-stationary data streams with no explicit task information. To address TFCL,
we introduce an evolved mixture model whose network architecture is dynamically
expanded to adapt to the data distribution shift. We implement this expansion
mechanism by evaluating the probability distance between the knowledge stored
in each mixture model component and the current memory buffer using the Hilbert
Schmidt Independence Criterion (HSIC). We further introduce two simple dropout
mechanisms to selectively remove stored examples in order to avoid memory
overload while preserving memory diversity. Empirical results demonstrate that
the proposed approach achieves excellent performance.Comment: Accepted by the 29th IEEE International Conference on Image
Processing (ICIP 2022
Online Lifelong Generalized Zero-Shot Learning
Methods proposed in the literature for zero-shot learning (ZSL) are typically
suitable for offline learning and cannot continually learn from sequential
streaming data. The sequential data comes in the form of tasks during training.
Recently, a few attempts have been made to handle this issue and develop
continual ZSL (CZSL) methods. However, these CZSL methods require clear
task-boundary information between the tasks during training, which is not
practically possible. This paper proposes a task-free (i.e., task-agnostic)
CZSL method, which does not require any task information during continual
learning. The proposed task-free CZSL method employs a variational autoencoder
(VAE) for performing ZSL. To develop the CZSL method, we combine the concept of
experience replay with knowledge distillation and regularization. Here,
knowledge distillation is performed using the training sample's dark knowledge,
which essentially helps overcome the catastrophic forgetting issue. Further, it
is enabled for task-free learning using short-term memory. Finally, a
classifier is trained on the synthetic features generated at the latent space
of the VAE. Moreover, the experiments are conducted in a challenging and
practical ZSL setup, i.e., generalized ZSL (GZSL). These experiments are
conducted for two kinds of single-head continual learning settings: (i) mild
setting-: task-boundary is known only during training but not during testing;
(ii) strict setting-: task-boundary is not known at training, as well as
testing. Experimental results on five benchmark datasets exhibit the validity
of the approach for CZSL
Continual Learning, Fast and Slow
According to the Complementary Learning Systems (CLS)
theory~\cite{mcclelland1995there} in neuroscience, humans do effective
\emph{continual learning} through two complementary systems: a fast learning
system centered on the hippocampus for rapid learning of the specifics,
individual experiences; and a slow learning system located in the neocortex for
the gradual acquisition of structured knowledge about the environment.
Motivated by this theory, we propose \emph{DualNets} (for Dual Networks), a
general continual learning framework comprising a fast learning system for
supervised learning of pattern-separated representation from specific tasks and
a slow learning system for representation learning of task-agnostic general
representation via Self-Supervised Learning (SSL). DualNets can seamlessly
incorporate both representation types into a holistic framework to facilitate
better continual learning in deep neural networks. Via extensive experiments,
we demonstrate the promising results of DualNets on a wide range of continual
learning protocols, ranging from the standard offline, task-aware setting to
the challenging online, task-free scenario. Notably, on the
CTrL~\cite{veniat2020efficient} benchmark that has unrelated tasks with vastly
different visual images, DualNets can achieve competitive performance with
existing state-of-the-art dynamic architecture
strategies~\cite{ostapenko2021continual}. Furthermore, we conduct comprehensive
ablation studies to validate DualNets efficacy, robustness, and scalability.
Code will be made available at \url{https://github.com/phquang/DualNet}.Comment: arXiv admin note: substantial text overlap with arXiv:2110.0017
On the Robustness, Generalization, and Forgetting of Shape-Texture Debiased Continual Learning
Tremendous progress has been made in continual learning to maintain good
performance on old tasks when learning new tasks by tackling the catastrophic
forgetting problem of neural networks. This paper advances continual learning
by further considering its out-of-distribution robustness, in response to the
vulnerability of continually trained models to distribution shifts (e.g., due
to data corruptions and domain shifts) in inference. To this end, we propose
shape-texture debiased continual learning. The key idea is to learn
generalizable and robust representations for each task with shape-texture
debiased training. In order to transform standard continual learning to
shape-texture debiased continual learning, we propose shape-texture debiased
data generation and online shape-texture debiased self-distillation.
Experiments on six datasets demonstrate the benefits of our approach in
improving generalization and robustness, as well as reducing forgetting. Our
analysis on the flatness of the loss landscape explains the advantages.
Moreover, our approach can be easily combined with new advanced architectures
such as vision transformer, and applied to more challenging scenarios such as
exemplar-free continual learning
Sparse Distributed Memory is a Continual Learner
Continual learning is a problem for artificial neural networks that their
biological counterparts are adept at solving. Building on work using Sparse
Distributed Memory (SDM) to connect a core neural circuit with the powerful
Transformer model, we create a modified Multi-Layered Perceptron (MLP) that is
a strong continual learner. We find that every component of our MLP variant
translated from biology is necessary for continual learning. Our solution is
also free from any memory replay or task information, and introduces novel
methods to train sparse networks that may be broadly applicable.Comment: 9 Pages. ICLR Acceptanc
A Simple Baseline that Questions the Use of Pretrained-Models in Continual Learning
With the success of pretraining techniques in representation learning, a
number of continual learning methods based on pretrained models have been
proposed. Some of these methods design continual learning mechanisms on the
pre-trained representations and only allow minimum updates or even no updates
of the backbone models during the training of continual learning. In this
paper, we question whether the complexity of these models is needed to achieve
good performance by comparing them to a simple baseline that we designed. We
argue that the pretrained feature extractor itself can be strong enough to
achieve a competitive or even better continual learning performance on
Split-CIFAR100 and CoRe 50 benchmarks. To validate this, we conduct a very
simple baseline that 1) use the frozen pretrained model to extract image
features for every class encountered during the continual learning stage and
compute their corresponding mean features on training data, and 2) predict the
class of the input based on the nearest neighbor distance between test samples
and mean features of the classes; i.e., Nearest Mean Classifier (NMC). This
baseline is single-headed, exemplar-free, and can be task-free (by updating the
means continually). This baseline achieved 88.53% on 10-Split-CIFAR-100,
surpassing most state-of-the-art continual learning methods that are all
initialized using the same pretrained transformer model. We hope our baseline
may encourage future progress in designing learning systems that can
continually add quality to the learning representations even if they started
from some pretrained weights.Comment: 6 pages, Under review , Code available at
https://github.com/Pauljanson002/pretrained-cl.gi
- …