950 research outputs found
Hierarchical Coding for Distributed Computing
Coding for distributed computing supports low-latency computation by
relieving the burden of straggling workers. While most existing works assume a
simple master-worker model, we consider a hierarchical computational structure
consisting of groups of workers, motivated by the need to reflect the
architectures of real-world distributed computing systems. In this work, we
propose a hierarchical coding scheme for this model, as well as analyze its
decoding cost and expected computation time. Specifically, we first provide
upper and lower bounds on the expected computing time of the proposed scheme.
We also show that our scheme enables efficient parallel decoding, thus reducing
decoding costs by orders of magnitude over non-hierarchical schemes. When
considering both decoding cost and computing time, the proposed hierarchical
coding is shown to outperform existing schemes in many practical scenarios.Comment: 7 pages, part of the paper is submitted to ISIT201
Block-Diagonal and LT Codes for Distributed Computing With Straggling Servers
We propose two coded schemes for the distributed computing problem of
multiplying a matrix by a set of vectors. The first scheme is based on
partitioning the matrix into submatrices and applying maximum distance
separable (MDS) codes to each submatrix. For this scheme, we prove that up to a
given number of partitions the communication load and the computational delay
(not including the encoding and decoding delay) are identical to those of the
scheme recently proposed by Li et al., based on a single, long MDS code.
However, due to the use of shorter MDS codes, our scheme yields a significantly
lower overall computational delay when the delay incurred by encoding and
decoding is also considered. We further propose a second coded scheme based on
Luby Transform (LT) codes under inactivation decoding. Interestingly, LT codes
may reduce the delay over the partitioned scheme at the expense of an increased
communication load. We also consider distributed computing under a deadline and
show numerically that the proposed schemes outperform other schemes in the
literature, with the LT code-based scheme yielding the best performance for the
scenarios considered.Comment: To appear in IEEE Transactions on Communication
Latency Analysis of Coded Computation Schemes over Wireless Networks
Large-scale distributed computing systems face two major bottlenecks that
limit their scalability: straggler delay caused by the variability of
computation times at different worker nodes and communication bottlenecks
caused by shuffling data across many nodes in the network. Recently, it has
been shown that codes can provide significant gains in overcoming these
bottlenecks. In particular, optimal coding schemes for minimizing latency in
distributed computation of linear functions and mitigating the effect of
stragglers was proposed for a wired network, where the workers can
simultaneously transmit messages to a master node without interference. In this
paper, we focus on the problem of coded computation over a wireless
master-worker setup with straggling workers, where only one worker can transmit
the result of its local computation back to the master at a time. We consider 3
asymptotic regimes (determined by how the communication and computation times
are scaled with the number of workers) and precisely characterize the total
run-time of the distributed algorithm and optimum coding strategy in each
regime. In particular, for the regime of practical interest where the
computation and communication times of the distributed computing algorithm are
comparable, we show that the total run-time approaches a simple lower bound
that decouples computation and communication, and demonstrate that coded
schemes are times faster than uncoded schemes
Distributed Computations with Layered Resolution
Modern computationally-heavy applications are often time-sensitive, demanding
distributed strategies to accelerate them. On the other hand, distributed
computing suffers from the bottleneck of slow workers in practice. Distributed
coded computing is an attractive solution that adds redundancy such that a
subset of distributed computations suffices to obtain the final result.
However, the final result is still either obtained within a desired time or
not, and for the latter, the resources that are spent are wasted. In this
paper, we introduce the novel concept of layered-resolution distributed coded
computations such that lower resolutions of the final result are obtained from
collective results of the workers -- at an earlier stage than the final result.
This innovation makes it possible to have more effective deadline-based
systems, since even if a computational job is terminated because of timing, an
approximated version of the final result can be released. Based on our
theoretical and empirical results, the average execution delay for the first
resolution is notably smaller than the one for the final resolution. Moreover,
the probability of meeting a deadline is one for the first resolution in a
setting where the final resolution exceeds the deadline almost all the time,
reducing the success rate of the systems with no layering
- …