3 research outputs found
Towards Federated Learning at Scale: System Design
Federated Learning is a distributed machine learning approach which enables
model training on a large corpus of decentralized data. We have built a
scalable production system for Federated Learning in the domain of mobile
devices, based on TensorFlow. In this paper, we describe the resulting
high-level design, sketch some of the challenges and their solutions, and touch
upon the open problems and future directions
Papaya: Practical, Private, and Scalable Federated Learning
Cross-device Federated Learning (FL) is a distributed learning paradigm with
several challenges that differentiate it from traditional distributed learning,
variability in the system characteristics on each device, and millions of
clients coordinating with a central server being primary ones. Most FL systems
described in the literature are synchronous - they perform a synchronized
aggregation of model updates from individual clients. Scaling synchronous FL is
challenging since increasing the number of clients training in parallel leads
to diminishing returns in training speed, analogous to large-batch training.
Moreover, stragglers hinder synchronous FL training. In this work, we outline a
production asynchronous FL system design. Our work tackles the aforementioned
issues, sketches of some of the system design challenges and their solutions,
and touches upon principles that emerged from building a production FL system
for millions of clients. Empirically, we demonstrate that asynchronous FL
converges faster than synchronous FL when training across nearly one hundred
million devices. In particular, in high concurrency settings, asynchronous FL
is 5x faster and has nearly 8x less communication overhead than synchronous FL
Federated Learning with Buffered Asynchronous Aggregation
Scalability and privacy are two critical concerns for cross-device federated
learning (FL)systems. In this work, we identify that synchronous FL -
synchronized aggregation of client updates in FL cannot scale efficiently
beyond a few hundred clients training in parallel. It leads to diminishing
returns in modelperformance and training speed, analogousto large-batch
training. On the other hand, asynchronous aggregation of client updates in FL
(i.e., asynchronous FL) alleviates the scalability issue. However, aggregating
individualclient updates is incompatible with Secure Aggregation, which could
result in an undesirable level of privacy for the system. To address these
concerns, we propose a novel buffered asynchronous aggregation method, FedBuff,
that is agnostic to the choice of optimizer, and combines the best properties
of synchronous and asynchronous FL. We empirically demonstrate that FedBuff is
3.3x more efficient than synchronous FL and up to 2.5x more efficient than
asynchronous FL, while being compatible with privacy-preserving technologies
such as Secure Aggregation and differential privacy. We provide theoretical
convergence guarantees in a smooth non-convex setting. Finally, we show that
under differentially private training, FedBuff can outperform FedAvgM at low
privacy settings and achieve the same utility for higher privacy settings