178 research outputs found
Universally Decodable Matrices for Distributed Matrix-Vector Multiplication
Coded computation is an emerging research area that leverages concepts from
erasure coding to mitigate the effect of stragglers (slow nodes) in distributed
computation clusters, especially for matrix computation problems. In this work,
we present a class of distributed matrix-vector multiplication schemes that are
based on codes in the Rosenbloom-Tsfasman metric and universally decodable
matrices. Our schemes take into account the inherent computation order within a
worker node. In particular, they allow us to effectively leverage partial
computations performed by stragglers (a feature that many prior works lack). An
additional main contribution of our work is a companion matrix-based embedding
of these codes that allows us to obtain sparse and numerically stable schemes
for the problem at hand. Experimental results confirm the effectiveness of our
techniques.Comment: 6 pages, 1 figur
Hierarchical Coding for Distributed Computing
Coding for distributed computing supports low-latency computation by
relieving the burden of straggling workers. While most existing works assume a
simple master-worker model, we consider a hierarchical computational structure
consisting of groups of workers, motivated by the need to reflect the
architectures of real-world distributed computing systems. In this work, we
propose a hierarchical coding scheme for this model, as well as analyze its
decoding cost and expected computation time. Specifically, we first provide
upper and lower bounds on the expected computing time of the proposed scheme.
We also show that our scheme enables efficient parallel decoding, thus reducing
decoding costs by orders of magnitude over non-hierarchical schemes. When
considering both decoding cost and computing time, the proposed hierarchical
coding is shown to outperform existing schemes in many practical scenarios.Comment: 7 pages, part of the paper is submitted to ISIT201
Latency Analysis of Coded Computation Schemes over Wireless Networks
Large-scale distributed computing systems face two major bottlenecks that
limit their scalability: straggler delay caused by the variability of
computation times at different worker nodes and communication bottlenecks
caused by shuffling data across many nodes in the network. Recently, it has
been shown that codes can provide significant gains in overcoming these
bottlenecks. In particular, optimal coding schemes for minimizing latency in
distributed computation of linear functions and mitigating the effect of
stragglers was proposed for a wired network, where the workers can
simultaneously transmit messages to a master node without interference. In this
paper, we focus on the problem of coded computation over a wireless
master-worker setup with straggling workers, where only one worker can transmit
the result of its local computation back to the master at a time. We consider 3
asymptotic regimes (determined by how the communication and computation times
are scaled with the number of workers) and precisely characterize the total
run-time of the distributed algorithm and optimum coding strategy in each
regime. In particular, for the regime of practical interest where the
computation and communication times of the distributed computing algorithm are
comparable, we show that the total run-time approaches a simple lower bound
that decouples computation and communication, and demonstrate that coded
schemes are times faster than uncoded schemes
OverSketch: Approximate Matrix Multiplication for the Cloud
We propose OverSketch, an approximate algorithm for distributed matrix
multiplication in serverless computing. OverSketch leverages ideas from matrix
sketching and high-performance computing to enable cost-efficient
multiplication that is resilient to faults and straggling nodes pervasive in
low-cost serverless architectures. We establish statistical guarantees on the
accuracy of OverSketch and empirically validate our results by solving a
large-scale linear program using interior-point methods and demonstrate a 34%
reduction in compute time on AWS Lambda.Comment: Published in Proc. IEEE Big Data 2018. Updated version provides
details of distributed sketching and highlights other advantages of
OverSketc
- …