2,338 research outputs found
Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging
Deep learning at scale is dominated by communication time. Distributing
samples across nodes usually yields the best performance, but poses scaling
challenges due to global information dissemination and load imbalance across
uneven sample lengths. State-of-the-art decentralized optimizers mitigate the
problem, but require more iterations to achieve the same accuracy as their
globally-communicating counterparts. We present Wait-Avoiding Group Model
Averaging (WAGMA) SGD, a wait-avoiding stochastic optimizer that reduces global
communication via subgroup weight exchange. The key insight is a combination of
algorithmic changes to the averaging scheme and the use of a group allreduce
operation. We prove the convergence of WAGMA-SGD, and empirically show that it
retains convergence rates similar to Allreduce-SGD. For evaluation, we train
ResNet-50 on ImageNet; Transformer for machine translation; and deep
reinforcement learning for navigation at scale. Compared with state-of-the-art
decentralized SGD variants, WAGMA-SGD significantly improves training
throughput (e.g., 2.1x on 1,024 GPUs for reinforcement learning), and achieves
the fastest time-to-solution (e.g., the highest score using the shortest
training time for Transformer).Comment: Published in IEEE Transactions on Parallel and Distributed Systems
(IEEE TPDS), vol. 32, no. 7, pp. 1725-1739, 1 July 202
LAGC: Lazily Aggregated Gradient Coding for Straggler-Tolerant and Communication-Efficient Distributed Learning
Gradient-based distributed learning in Parameter Server (PS) computing
architectures is subject to random delays due to straggling worker nodes, as
well as to possible communication bottlenecks between PS and workers. Solutions
have been recently proposed to separately address these impairments based on
the ideas of gradient coding, worker grouping, and adaptive worker selection.
This paper provides a unified analysis of these techniques in terms of
wall-clock time, communication, and computation complexity measures.
Furthermore, in order to combine the benefits of gradient coding and grouping
in terms of robustness to stragglers with the communication and computation
load gains of adaptive selection, novel strategies, named Lazily Aggregated
Gradient Coding (LAGC) and Grouped-LAG (G-LAG), are introduced. Analysis and
results show that G-LAG provides the best wall-clock time and communication
performance, while maintaining a low computational cost, for two representative
distributions of the computing times of the worker nodes.Comment: Submitte
PPFL: A Personalized Federated Learning Framework for Heterogeneous Population
Personalization aims to characterize individual preferences and is widely
applied across many fields. However, conventional personalized methods operate
in a centralized manner and potentially expose the raw data when pooling
individual information. In this paper, with privacy considerations, we develop
a flexible and interpretable personalized framework within the paradigm of
Federated Learning, called PPFL (Population Personalized Federated Learning).
By leveraging canonical models to capture fundamental characteristics among the
heterogeneous population and employing membership vectors to reveal clients'
preferences, it models the heterogeneity as clients' varying preferences for
these characteristics and provides substantial insights into client
characteristics, which is lacking in existing Personalized Federated Learning
(PFL) methods. Furthermore, we explore the relationship between our method and
three main branches of PFL methods: multi-task PFL, clustered FL, and
decoupling PFL, and demonstrate the advantages of PPFL. To solve PPFL (a
non-convex constrained optimization problem), we propose a novel random block
coordinate descent algorithm and present the convergence property. We conduct
experiments on both pathological and practical datasets, and the results
validate the effectiveness of PPFL.Comment: 38 pages, 11 figure
- …