11 research outputs found
LAGC: Lazily Aggregated Gradient Coding for Straggler-Tolerant and Communication-Efficient Distributed Learning
Gradient-based distributed learning in Parameter Server (PS) computing
architectures is subject to random delays due to straggling worker nodes, as
well as to possible communication bottlenecks between PS and workers. Solutions
have been recently proposed to separately address these impairments based on
the ideas of gradient coding, worker grouping, and adaptive worker selection.
This paper provides a unified analysis of these techniques in terms of
wall-clock time, communication, and computation complexity measures.
Furthermore, in order to combine the benefits of gradient coding and grouping
in terms of robustness to stragglers with the communication and computation
load gains of adaptive selection, novel strategies, named Lazily Aggregated
Gradient Coding (LAGC) and Grouped-LAG (G-LAG), are introduced. Analysis and
results show that G-LAG provides the best wall-clock time and communication
performance, while maintaining a low computational cost, for two representative
distributions of the computing times of the worker nodes.Comment: Submitte
Gradient Coding with Dynamic Clustering for Straggler Mitigation
In distributed synchronous gradient descent (GD) the main performance
bottleneck for the per-iteration completion time is the slowest
\textit{straggling} workers. To speed up GD iterations in the presence of
stragglers, coded distributed computation techniques are implemented by
assigning redundant computations to workers. In this paper, we propose a novel
gradient coding (GC) scheme that utilizes dynamic clustering, denoted by GC-DC,
to speed up the gradient calculation. Under time-correlated straggling
behavior, GC-DC aims at regulating the number of straggling workers in each
cluster based on the straggler behavior in the previous iteration. We
numerically show that GC-DC provides significant improvements in the average
completion time (of each iteration) with no increase in the communication load
compared to the original GC scheme
Iterative Sketching for Secure Coded Regression
In this work, we propose methods for speeding up linear regression
distributively, while ensuring security. We leverage randomized sketching
techniques, and improve straggler resilience in asynchronous systems.
Specifically, we apply a random orthonormal matrix and then subsample
\textit{blocks}, to simultaneously secure the information and reduce the
dimension of the regression problem. In our setup, the transformation
corresponds to an encoded encryption in an \textit{approximate gradient coding
scheme}, and the subsampling corresponds to the responses of the non-straggling
workers; in a centralized coded computing network. This results in a
distributive \textit{iterative sketching} approach for an -subspace
embedding, \textit{i.e.} a new sketch is considered at each iteration. We also
focus on the special case of the \textit{Subsampled Randomized Hadamard
Transform}, which we generalize to block sampling; and discuss how it can be
modified in order to secure the data.Comment: 28 pages, 7 figures. arXiv admin note: substantial text overlap with
arXiv:2201.0852