Search CORE

5 research outputs found

Latency Analysis of Coded Computation Schemes over Wireless Networks

Author: Pedarsani Ramtin
Reisizadeh Amirhossein
Publication venue
Publication date: 30/06/2017
Field of study

Large-scale distributed computing systems face two major bottlenecks that limit their scalability: straggler delay caused by the variability of computation times at different worker nodes and communication bottlenecks caused by shuffling data across many nodes in the network. Recently, it has been shown that codes can provide significant gains in overcoming these bottlenecks. In particular, optimal coding schemes for minimizing latency in distributed computation of linear functions and mitigating the effect of stragglers was proposed for a wired network, where the workers can simultaneously transmit messages to a master node without interference. In this paper, we focus on the problem of coded computation over a wireless master-worker setup with straggling workers, where only one worker can transmit the result of its local computation back to the master at a time. We consider 3 asymptotic regimes (determined by how the communication and computation times are scaled with the number of workers) and precisely characterize the total run-time of the distributed algorithm and optimum coding strategy in each regime. In particular, for the regime of practical interest where the computation and communication times of the distributed computing algorithm are comparable, we show that the total run-time approaches a simple lower bound that decouples computation and communication, and demonstrate that coded schemes are

\Theta(\log(n))

times faster than uncoded schemes

arXiv.org e-Print Archive

Crossref

Communication-Computation Efficient Gradient Coding

Author: Abbe Emmanuel
Ye Min
Publication venue
Publication date: 01/01/2018
Field of study

This paper develops coding techniques to reduce the running time of distributed learning tasks. It characterizes the fundamental tradeoff to compute gradients (and more generally vector summations) in terms of three parameters: computation load, straggler tolerance and communication cost. It further gives an explicit coding scheme that achieves the optimal tradeoff based on recursive polynomial constructions, coding both across data subsets and vector components. As a result, the proposed scheme allows to minimize the running time for gradient computations. Implementations are made on Amazon EC2 clusters using Python with mpi4py package. Results show that the proposed scheme maintains the same generalization error while reducing the running time by

32\%

compared to uncoded schemes and

23\%

compared to prior coded schemes focusing only on stragglers (Tandon et al., ICML 2017)

arXiv.org e-Print Archive

Princeton University Open Access Repository