2,980 research outputs found
Federated Class-Incremental Learning with Prompting
As Web technology continues to develop, it has become increasingly common to
use data stored on different clients. At the same time, federated learning has
received widespread attention due to its ability to protect data privacy when
let models learn from data which is distributed across various clients.
However, most existing works assume that the client's data are fixed. In
real-world scenarios, such an assumption is most likely not true as data may be
continuously generated and new classes may also appear. To this end, we focus
on the practical and challenging federated class-incremental learning (FCIL)
problem. For FCIL, the local and global models may suffer from catastrophic
forgetting on old classes caused by the arrival of new classes and the data
distributions of clients are non-independent and identically distributed
(non-iid).
In this paper, we propose a novel method called Federated Class-Incremental
Learning with PrompTing (FCILPT). Given the privacy and limited memory, FCILPT
does not use a rehearsal-based buffer to keep exemplars of old data. We choose
to use prompts to ease the catastrophic forgetting of the old classes.
Specifically, we encode the task-relevant and task-irrelevant knowledge into
prompts, preserving the old and new knowledge of the local clients and solving
the problem of catastrophic forgetting. We first sort the task information in
the prompt pool in the local clients to align the task information on different
clients before global aggregation. It ensures that the same task's knowledge
are fully integrated, solving the problem of non-iid caused by the lack of
classes among different clients in the same incremental task. Experiments on
CIFAR-100, Mini-ImageNet, and Tiny-ImageNet demonstrate that FCILPT achieves
significant accuracy improvements over the state-of-the-art methods
Adaptive Differential Privacy in Federated Learning: A Priority-Based Approach
Federated learning (FL) as one of the novel branches of distributed machine
learning (ML), develops global models through a private procedure without
direct access to local datasets. However, access to model updates (e.g.
gradient updates in deep neural networks) transferred between clients and
servers can reveal sensitive information to adversaries. Differential privacy
(DP) offers a framework that gives a privacy guarantee by adding certain
amounts of noise to parameters. This approach, although being effective in
terms of privacy, adversely affects model performance due to noise involvement.
Hence, it is always needed to find a balance between noise injection and the
sacrificed accuracy. To address this challenge, we propose adaptive noise
addition in FL which decides the value of injected noise based on features'
relative importance. Here, we first propose two effective methods for
prioritizing features in deep neural network models and then perturb models'
weights based on this information. Specifically, we try to figure out whether
the idea of adding more noise to less important parameters and less noise to
more important parameters can effectively save the model accuracy while
preserving privacy. Our experiments confirm this statement under some
conditions. The amount of noise injected, the proportion of parameters
involved, and the number of global iterations can significantly change the
output. While a careful choice of parameters by considering the properties of
datasets can improve privacy without intense loss of accuracy, a bad choice can
make the model performance worse
FedProf: Selective Federated Learning with Representation Profiling
Federated Learning (FL) has shown great potential as a privacy-preserving
solution to learning from decentralized data that are only accessible to end
devices (i.e., clients). In many scenarios however, a large proportion of the
clients are probably in possession of low-quality data that are biased, noisy
or even irrelevant. As a result, they could significantly slow down the
convergence of the global model we aim to build and also compromise its
quality. In light of this, we propose FedProf, a novel algorithm for optimizing
FL under such circumstances without breaching data privacy. The key of our
approach is a data representation profiling and matching scheme that uses the
global model to dynamically profile data representations and allows for
low-cost, lightweight representation matching. Based on the scheme we
adaptively score each client and adjust its participation probability so as to
mitigate the impact of low-value clients on the training process. We have
conducted extensive experiments on public datasets using various FL settings.
The results show that FedProf effectively reduces the number of communication
rounds and overall time (up to 4.5x speedup) for the global model to converge
and provides accuracy gain.Comment: 23 pages (references and appendices included
Heterogeneous Federated Learning: State-of-the-art and Research Challenges
Federated learning (FL) has drawn increasing attention owing to its potential
use in large-scale industrial applications. Existing federated learning works
mainly focus on model homogeneous settings. However, practical federated
learning typically faces the heterogeneity of data distributions, model
architectures, network environments, and hardware devices among participant
clients. Heterogeneous Federated Learning (HFL) is much more challenging, and
corresponding solutions are diverse and complex. Therefore, a systematic survey
on this topic about the research challenges and state-of-the-art is essential.
In this survey, we firstly summarize the various research challenges in HFL
from five aspects: statistical heterogeneity, model heterogeneity,
communication heterogeneity, device heterogeneity, and additional challenges.
In addition, recent advances in HFL are reviewed and a new taxonomy of existing
HFL methods is proposed with an in-depth analysis of their pros and cons. We
classify existing methods from three different levels according to the HFL
procedure: data-level, model-level, and server-level. Finally, several critical
and promising future research directions in HFL are discussed, which may
facilitate further developments in this field. A periodically updated
collection on HFL is available at https://github.com/marswhu/HFL_Survey.Comment: 42 pages, 11 figures, and 4 table
A Snapshot of the Frontiers of Client Selection in Federated Learning
Federated learning (FL) has been proposed as a privacy-preserving approach in
distributed machine learning. A federated learning architecture consists of a
central server and a number of clients that have access to private, potentially
sensitive data. Clients are able to keep their data in their local machines and
only share their locally trained model's parameters with a central server that
manages the collaborative learning process. FL has delivered promising results
in real-life scenarios, such as healthcare, energy, and finance. However, when
the number of participating clients is large, the overhead of managing the
clients slows down the learning. Thus, client selection has been introduced as
a strategy to limit the number of communicating parties at every step of the
process. Since the early na\"{i}ve random selection of clients, several client
selection methods have been proposed in the literature. Unfortunately, given
that this is an emergent field, there is a lack of a taxonomy of client
selection methods, making it hard to compare approaches. In this paper, we
propose a taxonomy of client selection in Federated Learning that enables us to
shed light on current progress in the field and identify potential areas of
future research in this promising area of machine learning.Comment: 17 pages, 3 figures, 1 appendix, submitted to TML
Peer to Peer Information Retrieval: An Overview
Peer-to-peer technology is widely used for file sharing. In the past decade a number of prototype peer-to-peer information retrieval systems have been developed. Unfortunately, none of these have seen widespread real- world adoption and thus, in contrast with file sharing, information retrieval is still dominated by centralised solutions. In this paper we provide an overview of the key challenges for peer-to-peer information retrieval and the work done so far. We want to stimulate and inspire further research to overcome these challenges. This will open the door to the development and large-scale deployment of real-world peer-to-peer information retrieval systems that rival existing centralised client-server solutions in terms of scalability, performance, user satisfaction and freedom
Gradient Coreset for Federated Learning
Federated Learning (FL) is used to learn machine learning models with data
that is partitioned across multiple clients, including resource-constrained
edge devices. It is therefore important to devise solutions that are efficient
in terms of compute, communication, and energy consumption, while ensuring
compliance with the FL framework's privacy requirements. Conventional
approaches to these problems select a weighted subset of the training dataset,
known as coreset, and learn by fitting models on it. Such coreset selection
approaches are also known to be robust to data noise. However, these approaches
rely on the overall statistics of the training data and are not easily
extendable to the FL setup.
In this paper, we propose an algorithm called Gradient based Coreset for
Robust and Efficient Federated Learning (GCFL) that selects a coreset at each
client, only every communication rounds and derives updates only from it,
assuming the availability of a small validation dataset at the server. We
demonstrate that our coreset selection technique is highly effective in
accounting for noise in clients' data. We conduct experiments using four
real-world datasets and show that GCFL is (1) more compute and energy efficient
than FL, (2) robust to various kinds of noise in both the feature space and
labels, (3) preserves the privacy of the validation dataset, and (4) introduces
a small communication overhead but achieves significant gains in performance,
particularly in cases when the clients' data is noisy.Comment: Accepted at WACV-2
- …