3,473 research outputs found
FedYolo: Augmenting Federated Learning with Pretrained Transformers
The growth and diversity of machine learning applications motivate a
rethinking of learning with mobile and edge devices. How can we address diverse
client goals and learn with scarce heterogeneous data? While federated learning
aims to address these issues, it has challenges hindering a unified solution.
Large transformer models have been shown to work across a variety of tasks
achieving remarkable few-shot adaptation. This raises the question: Can clients
use a single general-purpose model, rather than custom models for each task,
while obeying device and network constraints? In this work, we investigate
pretrained transformers (PTF) to achieve these on-device learning goals and
thoroughly explore the roles of model size and modularity, where the latter
refers to adaptation through modules such as prompts or adapters. Focusing on
federated learning, we demonstrate that: (1) Larger scale shrinks the accuracy
gaps between alternative approaches and improves heterogeneity robustness.
Scale allows clients to run more local SGD epochs which can significantly
reduce the number of communication rounds. At the extreme, clients can achieve
respectable accuracy locally highlighting the potential of fully-local
learning. (2) Modularity, by design, enables 100 less communication
in bits. Surprisingly, it also boosts the generalization capability of local
adaptation methods and the robustness of smaller PTFs. Finally, it enables
clients to solve multiple unrelated tasks simultaneously using a single PTF,
whereas full updates are prone to catastrophic forgetting. These insights on
scale and modularity motivate a new federated learning approach we call "You
Only Load Once" (FedYolo): The clients load a full PTF model once and all
future updates are accomplished through communication-efficient modules with
limited catastrophic-forgetting, where each task is assigned to its own module.Comment: 20 pages, 18 figure
A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning
Forgetting refers to the loss or deterioration of previously acquired
information or knowledge. While the existing surveys on forgetting have
primarily focused on continual learning, forgetting is a prevalent phenomenon
observed in various other research domains within deep learning. Forgetting
manifests in research fields such as generative models due to generator shifts,
and federated learning due to heterogeneous data distributions across clients.
Addressing forgetting encompasses several challenges, including balancing the
retention of old task knowledge with fast learning of new tasks, managing task
interference with conflicting goals, and preventing privacy leakage, etc.
Moreover, most existing surveys on continual learning implicitly assume that
forgetting is always harmful. In contrast, our survey argues that forgetting is
a double-edged sword and can be beneficial and desirable in certain cases, such
as privacy-preserving scenarios. By exploring forgetting in a broader context,
we aim to present a more nuanced understanding of this phenomenon and highlight
its potential advantages. Through this comprehensive survey, we aspire to
uncover potential solutions by drawing upon ideas and approaches from various
fields that have dealt with forgetting. By examining forgetting beyond its
conventional boundaries, in future work, we hope to encourage the development
of novel strategies for mitigating, harnessing, or even embracing forgetting in
real applications. A comprehensive list of papers about forgetting in various
research fields is available at
\url{https://github.com/EnnengYang/Awesome-Forgetting-in-Deep-Learning}
Online Class Incremental Learning on Stochastic Blurry Task Boundary via Mask and Visual Prompt Tuning
Continual learning aims to learn a model from a continuous stream of data,
but it mainly assumes a fixed number of data and tasks with clear task
boundaries. However, in real-world scenarios, the number of input data and
tasks is constantly changing in a statistical way, not a static way. Although
recently introduced incremental learning scenarios having blurry task
boundaries somewhat address the above issues, they still do not fully reflect
the statistical properties of real-world situations because of the fixed ratio
of disjoint and blurry samples. In this paper, we propose a new Stochastic
incremental Blurry task boundary scenario, called Si-Blurry, which reflects the
stochastic properties of the real-world. We find that there are two major
challenges in the Si-Blurry scenario: (1) inter- and intra-task forgettings and
(2) class imbalance problem. To alleviate them, we introduce Mask and Visual
Prompt tuning (MVP). In MVP, to address the inter- and intra-task forgetting
issues, we propose a novel instance-wise logit masking and contrastive visual
prompt tuning loss. Both of them help our model discern the classes to be
learned in the current batch. It results in consolidating the previous
knowledge. In addition, to alleviate the class imbalance problem, we introduce
a new gradient similarity-based focal loss and adaptive feature scaling to ease
overfitting to the major classes and underfitting to the minor classes.
Extensive experiments show that our proposed MVP significantly outperforms the
existing state-of-the-art methods in our challenging Si-Blurry scenario
Susceptibility of Continual Learning Against Adversarial Attacks
Recent continual learning approaches have primarily focused on mitigating
catastrophic forgetting. Nevertheless, two critical areas have remained
relatively unexplored: 1) evaluating the robustness of proposed methods and 2)
ensuring the security of learned tasks. This paper investigates the
susceptibility of continually learned tasks, including current and previously
acquired tasks, to adversarial attacks. Specifically, we have observed that any
class belonging to any task can be easily targeted and misclassified as the
desired target class of any other task. Such susceptibility or vulnerability of
learned tasks to adversarial attacks raises profound concerns regarding data
integrity and privacy. To assess the robustness of continual learning
approaches, we consider continual learning approaches in all three scenarios,
i.e., task-incremental learning, domain-incremental learning, and
class-incremental learning. In this regard, we explore the robustness of three
regularization-based methods, three replay-based approaches, and one hybrid
technique that combines replay and exemplar approaches. We empirically
demonstrated that in any setting of continual learning, any class, whether
belonging to the current or previously learned tasks, is susceptible to
misclassification. Our observations identify potential limitations of continual
learning approaches against adversarial attacks and highlight that current
continual learning algorithms could not be suitable for deployment in
real-world settings.Comment: 18 pages, 13 figure
Adaptive Regularization for Class-Incremental Learning
Class-Incremental Learning updates a deep classifier with new categories
while maintaining the previously observed class accuracy. Regularizing the
neural network weights is a common method to prevent forgetting previously
learned classes while learning novel ones. However, existing regularizers use a
constant magnitude throughout the learning sessions, which may not reflect the
varying levels of difficulty of the tasks encountered during incremental
learning. This study investigates the necessity of adaptive regularization in
Class-Incremental Learning, which dynamically adjusts the regularization
strength according to the complexity of the task at hand. We propose a Bayesian
Optimization-based approach to automatically determine the optimal
regularization magnitude for each learning task. Our experiments on two
datasets via two regularizers demonstrate the importance of adaptive
regularization for achieving accurate and less forgetful visual incremental
learning
Mitigation of Catastrophic Interference in Neural Networks and Ensembles using a Fixed Expansion Layer
Catastrophic forgetting (also known in the literature as catastrophic interference) is the phenomenon by which learning systems exhibit a severe exponential loss of learned information when exposed to relatively small amounts of new training data. This loss of information is not caused by constraints due to the lack of resources available to the learning system, but rather is caused by representational overlap within the learning system and by side-effects of the training methods used. Catastrophic forgetting in auto-associative pattern recognition is a well-studied attribute of most parameterized supervised learning systems. A variation of this phenomenon, in the context of feedforward neural networks, arises when non-stationary inputs lead to loss of previously learned mappings. The majority of the schemes proposed in the literature for mitigating catastrophic forgetting are not data-driven, but rather rely on storage of prior representations of the learning system. We introduce the Fixed Expansion Layer (FEL) feedforward neural network that embeds an expansion layer which sparsely encodes the information contained within the hidden layer, in order to help mitigate forgetting of prior learned representations. The fixed expansion layer approach is generally applicable to feedforward neural networks, as demonstrated by the application of the FEL technique to a recurrent neural network algorithm built on top of a standard feedforward neural network. Additionally, we investigate a novel framework for training ensembles of FEL networks, based on exploiting an information-theoretic measure of diversity between FEL learners, to further control undesired plasticity. The proposed methodology is demonstrated on a several tasks, clearly emphasizing its advantages over existing techniques. The architecture proposed can be applied to address a range of computational intelligence tasks, including classification problems, regression problems and system control
Recommended from our members
Domain adaptation for neural machine translation
The development of deep learning techniques has allowed Neural Machine Translation (NMT) models to become extremely powerful, given sufficient training data and training time. However, such translation models struggle when translating text of a specific domain. A domain may consist of text on a well-defined topic, or text of unknown provenance with an identifiable vocabulary distribution, or language with some other stylometric feature. While NMT models can achieve good translation performance on domain-specific data via simple tuning on a representative training corpus, such data-centric approaches have negative side-effects. These include over-fitting, brittleness, and `catastrophic forgetting' of previous training examples.
In this thesis we instead explore more robust approaches to domain adaptation for NMT. We consider the case where a system is adapted to a specified domain of interest, but may also need to accommodate new language, or domain-mismatched sentences. We explore techniques relating to data selection and curriculum, model parameter adaptation procedure, and inference procedure. We show that iterative fine-tuning can achieve strong performance over multiple related domains, and that Elastic Weight Consolidation can be used to mitigate catastrophic forgetting in NMT domain adaptation across multiple sequential domains. We develop a robust variant of Minimum Risk Training which allows more beneficial use of small, highly domain-specific tuning sets than simple cross-entropy fine-tuning, and can mitigate exposure bias resulting from domain over-fitting. We extend Bayesian Interpolation inference schemes to Neural Machine Translation, allowing adaptive weighting of NMT ensembles to translate text from an unknown domain.
Finally we demonstrate the benefit of multi-domain adaptation approaches for other lines of NMT research. We show that NMT systems using multiple forms of data representation can benefit from multi-domain inference approaches. We also demonstrate a series of domain adaptation approaches to mitigating the effects of gender bias in machine translation
- …