3,443 research outputs found
Elastic Multi-Gradient Descent for Parallel Continual Learning
The goal of Continual Learning (CL) is to continuously learn from new data
streams and accomplish the corresponding tasks. Previously studied CL assumes
that data are given in sequence nose-to-tail for different tasks, thus indeed
belonging to Serial Continual Learning (SCL). This paper studies the novel
paradigm of Parallel Continual Learning (PCL) in dynamic multi-task scenarios,
where a diverse set of tasks is encountered at different time points. PCL
presents challenges due to the training of an unspecified number of tasks with
varying learning progress, leading to the difficulty of guaranteeing effective
model updates for all encountered tasks. In our previous conference work, we
focused on measuring and reducing the discrepancy among gradients in a
multi-objective optimization problem, which, however, may still contain
negative transfers in every model update. To address this issue, in the dynamic
multi-objective optimization problem, we introduce task-specific elastic
factors to adjust the descent direction towards the Pareto front. The proposed
method, called Elastic Multi-Gradient Descent (EMGD), ensures that each update
follows an appropriate Pareto descent direction, minimizing any negative impact
on previously learned tasks. To balance the training between old and new tasks,
we also propose a memory editing mechanism guided by the gradient computed
using EMGD. This editing process updates the stored data points, reducing
interference in the Pareto descent direction from previous tasks. Experiments
on public datasets validate the effectiveness of our EMGD in the PCL setting.Comment: Submited to IEEE TPAM
Learning an evolved mixture model for task-free continual learning
Recently, continual learning (CL) has gained significant interest because it
enables deep learning models to acquire new knowledge without forgetting
previously learnt information. However, most existing works require knowing the
task identities and boundaries, which is not realistic in a real context. In
this paper, we address a more challenging and realistic setting in CL, namely
the Task-Free Continual Learning (TFCL) in which a model is trained on
non-stationary data streams with no explicit task information. To address TFCL,
we introduce an evolved mixture model whose network architecture is dynamically
expanded to adapt to the data distribution shift. We implement this expansion
mechanism by evaluating the probability distance between the knowledge stored
in each mixture model component and the current memory buffer using the Hilbert
Schmidt Independence Criterion (HSIC). We further introduce two simple dropout
mechanisms to selectively remove stored examples in order to avoid memory
overload while preserving memory diversity. Empirical results demonstrate that
the proposed approach achieves excellent performance.Comment: Accepted by the 29th IEEE International Conference on Image
Processing (ICIP 2022
Recent Advances of Continual Learning in Computer Vision: An Overview
In contrast to batch learning where all training data is available at once,
continual learning represents a family of methods that accumulate knowledge and
learn continuously with data available in sequential order. Similar to the
human learning process with the ability of learning, fusing, and accumulating
new knowledge coming at different time steps, continual learning is considered
to have high practical significance. Hence, continual learning has been studied
in various artificial intelligence tasks. In this paper, we present a
comprehensive review of the recent progress of continual learning in computer
vision. In particular, the works are grouped by their representative
techniques, including regularization, knowledge distillation, memory,
generative replay, parameter isolation, and a combination of the above
techniques. For each category of these techniques, both its characteristics and
applications in computer vision are presented. At the end of this overview,
several subareas, where continuous knowledge accumulation is potentially
helpful while continual learning has not been well studied, are discussed
A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning
Forgetting refers to the loss or deterioration of previously acquired
information or knowledge. While the existing surveys on forgetting have
primarily focused on continual learning, forgetting is a prevalent phenomenon
observed in various other research domains within deep learning. Forgetting
manifests in research fields such as generative models due to generator shifts,
and federated learning due to heterogeneous data distributions across clients.
Addressing forgetting encompasses several challenges, including balancing the
retention of old task knowledge with fast learning of new tasks, managing task
interference with conflicting goals, and preventing privacy leakage, etc.
Moreover, most existing surveys on continual learning implicitly assume that
forgetting is always harmful. In contrast, our survey argues that forgetting is
a double-edged sword and can be beneficial and desirable in certain cases, such
as privacy-preserving scenarios. By exploring forgetting in a broader context,
we aim to present a more nuanced understanding of this phenomenon and highlight
its potential advantages. Through this comprehensive survey, we aspire to
uncover potential solutions by drawing upon ideas and approaches from various
fields that have dealt with forgetting. By examining forgetting beyond its
conventional boundaries, in future work, we hope to encourage the development
of novel strategies for mitigating, harnessing, or even embracing forgetting in
real applications. A comprehensive list of papers about forgetting in various
research fields is available at
\url{https://github.com/EnnengYang/Awesome-Forgetting-in-Deep-Learning}
Online Lifelong Generalized Zero-Shot Learning
Methods proposed in the literature for zero-shot learning (ZSL) are typically
suitable for offline learning and cannot continually learn from sequential
streaming data. The sequential data comes in the form of tasks during training.
Recently, a few attempts have been made to handle this issue and develop
continual ZSL (CZSL) methods. However, these CZSL methods require clear
task-boundary information between the tasks during training, which is not
practically possible. This paper proposes a task-free (i.e., task-agnostic)
CZSL method, which does not require any task information during continual
learning. The proposed task-free CZSL method employs a variational autoencoder
(VAE) for performing ZSL. To develop the CZSL method, we combine the concept of
experience replay with knowledge distillation and regularization. Here,
knowledge distillation is performed using the training sample's dark knowledge,
which essentially helps overcome the catastrophic forgetting issue. Further, it
is enabled for task-free learning using short-term memory. Finally, a
classifier is trained on the synthetic features generated at the latent space
of the VAE. Moreover, the experiments are conducted in a challenging and
practical ZSL setup, i.e., generalized ZSL (GZSL). These experiments are
conducted for two kinds of single-head continual learning settings: (i) mild
setting-: task-boundary is known only during training but not during testing;
(ii) strict setting-: task-boundary is not known at training, as well as
testing. Experimental results on five benchmark datasets exhibit the validity
of the approach for CZSL
Multi-View Class Incremental Learning
Multi-view learning (MVL) has gained great success in integrating information
from multiple perspectives of a dataset to improve downstream task performance.
To make MVL methods more practical in an open-ended environment, this paper
investigates a novel paradigm called multi-view class incremental learning
(MVCIL), where a single model incrementally classifies new classes from a
continual stream of views, requiring no access to earlier views of data.
However, MVCIL is challenged by the catastrophic forgetting of old information
and the interference with learning new concepts. To address this, we first
develop a randomization-based representation learning technique serving for
feature extraction to guarantee their separate view-optimal working states,
during which multiple views belonging to a class are presented sequentially;
Then, we integrate them one by one in the orthogonality fusion subspace spanned
by the extracted features; Finally, we introduce selective weight consolidation
for learning-without-forgetting decision-making while encountering new classes.
Extensive experiments on synthetic and real-world datasets validate the
effectiveness of our approach.Comment: 34 pages,4 figures. Under revie
PCR: Proxy-based Contrastive Replay for Online Class-Incremental Continual Learning
Online class-incremental continual learning is a specific task of continual
learning. It aims to continuously learn new classes from data stream and the
samples of data stream are seen only once, which suffers from the catastrophic
forgetting issue, i.e., forgetting historical knowledge of old classes.
Existing replay-based methods effectively alleviate this issue by saving and
replaying part of old data in a proxy-based or contrastive-based replay manner.
Although these two replay manners are effective, the former would incline to
new classes due to class imbalance issues, and the latter is unstable and hard
to converge because of the limited number of samples. In this paper, we conduct
a comprehensive analysis of these two replay manners and find that they can be
complementary. Inspired by this finding, we propose a novel replay-based method
called proxy-based contrastive replay (PCR). The key operation is to replace
the contrastive samples of anchors with corresponding proxies in the
contrastive-based way. It alleviates the phenomenon of catastrophic forgetting
by effectively addressing the imbalance issue, as well as keeps a faster
convergence of the model. We conduct extensive experiments on three real-world
benchmark datasets, and empirical results consistently demonstrate the
superiority of PCR over various state-of-the-art methods.Comment: To appear in CVPR 2023. 10 pages, 8 figures and 3 table
Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors
Large pre-trained models decay over long-term deployment as input
distributions shift, user requirements change, or crucial knowledge gaps are
discovered. Recently, model editors have been proposed to modify a model's
behavior by adjusting its weights during deployment. However, when editing the
same model multiple times, these approaches quickly decay a model's performance
on upstream data and forget how to fix previous errors. We propose and study a
novel Lifelong Model Editing setting, where streaming errors are identified for
a deployed model and we update the model to correct its predictions without
influencing unrelated inputs without access to training edits, exogenous
datasets, or any upstream data for the edited model. To approach this problem,
we introduce General Retrieval Adaptors for Continual Editing, or GRACE, which
learns to cache a chosen layer's activations in an adaptive codebook as edits
stream in, leaving original model weights frozen. GRACE can thus edit models
thousands of times in a row using only streaming errors, while minimally
influencing unrelated inputs. Experimentally, we show that GRACE improves over
recent model editors and generalizes to unseen inputs. Our code is available at
https://www.github.com/thartvigsen/grace
Continual Semantic Segmentation with Automatic Memory Sample Selection
Continual Semantic Segmentation (CSS) extends static semantic segmentation by
incrementally introducing new classes for training. To alleviate the
catastrophic forgetting issue in CSS, a memory buffer that stores a small
number of samples from the previous classes is constructed for replay. However,
existing methods select the memory samples either randomly or based on a
single-factor-driven handcrafted strategy, which has no guarantee to be
optimal. In this work, we propose a novel memory sample selection mechanism
that selects informative samples for effective replay in a fully automatic way
by considering comprehensive factors including sample diversity and class
performance. Our mechanism regards the selection operation as a decision-making
process and learns an optimal selection policy that directly maximizes the
validation performance on a reward set. To facilitate the selection decision,
we design a novel state representation and a dual-stage action space. Our
extensive experiments on Pascal-VOC 2012 and ADE 20K datasets demonstrate the
effectiveness of our approach with state-of-the-art (SOTA) performance
achieved, outperforming the second-place one by 12.54% for the 6stage setting
on Pascal-VOC 2012.Comment: Accepted to CVPR202
- …