9,448 research outputs found
Latency Analysis of Coded Computation Schemes over Wireless Networks
Large-scale distributed computing systems face two major bottlenecks that
limit their scalability: straggler delay caused by the variability of
computation times at different worker nodes and communication bottlenecks
caused by shuffling data across many nodes in the network. Recently, it has
been shown that codes can provide significant gains in overcoming these
bottlenecks. In particular, optimal coding schemes for minimizing latency in
distributed computation of linear functions and mitigating the effect of
stragglers was proposed for a wired network, where the workers can
simultaneously transmit messages to a master node without interference. In this
paper, we focus on the problem of coded computation over a wireless
master-worker setup with straggling workers, where only one worker can transmit
the result of its local computation back to the master at a time. We consider 3
asymptotic regimes (determined by how the communication and computation times
are scaled with the number of workers) and precisely characterize the total
run-time of the distributed algorithm and optimum coding strategy in each
regime. In particular, for the regime of practical interest where the
computation and communication times of the distributed computing algorithm are
comparable, we show that the total run-time approaches a simple lower bound
that decouples computation and communication, and demonstrate that coded
schemes are times faster than uncoded schemes
Lagrange Coded Computing: Optimal Design for Resiliency, Security and Privacy
We consider a scenario involving computations over a massive dataset stored
distributedly across multiple workers, which is at the core of distributed
learning algorithms. We propose Lagrange Coded Computing (LCC), a new framework
to simultaneously provide (1) resiliency against stragglers that may prolong
computations; (2) security against Byzantine (or malicious) workers that
deliberately modify the computation for their benefit; and (3)
(information-theoretic) privacy of the dataset amidst possible collusion of
workers. LCC, which leverages the well-known Lagrange polynomial to create
computation redundancy in a novel coded form across workers, can be applied to
any computation scenario in which the function of interest is an arbitrary
multivariate polynomial of the input dataset, hence covering many computations
of interest in machine learning. LCC significantly generalizes prior works to
go beyond linear computations. It also enables secure and private computing in
distributed settings, improving the computation and communication efficiency of
the state-of-the-art. Furthermore, we prove the optimality of LCC by showing
that it achieves the optimal tradeoff between resiliency, security, and
privacy, i.e., in terms of tolerating the maximum number of stragglers and
adversaries, and providing data privacy against the maximum number of colluding
workers. Finally, we show via experiments on Amazon EC2 that LCC speeds up the
conventional uncoded implementation of distributed least-squares linear
regression by up to , and also achieves a
- speedup over the state-of-the-art straggler
mitigation strategies
- …