950 research outputs found

    Hierarchical Coding for Distributed Computing

    Full text link
    Coding for distributed computing supports low-latency computation by relieving the burden of straggling workers. While most existing works assume a simple master-worker model, we consider a hierarchical computational structure consisting of groups of workers, motivated by the need to reflect the architectures of real-world distributed computing systems. In this work, we propose a hierarchical coding scheme for this model, as well as analyze its decoding cost and expected computation time. Specifically, we first provide upper and lower bounds on the expected computing time of the proposed scheme. We also show that our scheme enables efficient parallel decoding, thus reducing decoding costs by orders of magnitude over non-hierarchical schemes. When considering both decoding cost and computing time, the proposed hierarchical coding is shown to outperform existing schemes in many practical scenarios.Comment: 7 pages, part of the paper is submitted to ISIT201

    Block-Diagonal and LT Codes for Distributed Computing With Straggling Servers

    Get PDF
    We propose two coded schemes for the distributed computing problem of multiplying a matrix by a set of vectors. The first scheme is based on partitioning the matrix into submatrices and applying maximum distance separable (MDS) codes to each submatrix. For this scheme, we prove that up to a given number of partitions the communication load and the computational delay (not including the encoding and decoding delay) are identical to those of the scheme recently proposed by Li et al., based on a single, long MDS code. However, due to the use of shorter MDS codes, our scheme yields a significantly lower overall computational delay when the delay incurred by encoding and decoding is also considered. We further propose a second coded scheme based on Luby Transform (LT) codes under inactivation decoding. Interestingly, LT codes may reduce the delay over the partitioned scheme at the expense of an increased communication load. We also consider distributed computing under a deadline and show numerically that the proposed schemes outperform other schemes in the literature, with the LT code-based scheme yielding the best performance for the scenarios considered.Comment: To appear in IEEE Transactions on Communication

    Latency Analysis of Coded Computation Schemes over Wireless Networks

    Full text link
    Large-scale distributed computing systems face two major bottlenecks that limit their scalability: straggler delay caused by the variability of computation times at different worker nodes and communication bottlenecks caused by shuffling data across many nodes in the network. Recently, it has been shown that codes can provide significant gains in overcoming these bottlenecks. In particular, optimal coding schemes for minimizing latency in distributed computation of linear functions and mitigating the effect of stragglers was proposed for a wired network, where the workers can simultaneously transmit messages to a master node without interference. In this paper, we focus on the problem of coded computation over a wireless master-worker setup with straggling workers, where only one worker can transmit the result of its local computation back to the master at a time. We consider 3 asymptotic regimes (determined by how the communication and computation times are scaled with the number of workers) and precisely characterize the total run-time of the distributed algorithm and optimum coding strategy in each regime. In particular, for the regime of practical interest where the computation and communication times of the distributed computing algorithm are comparable, we show that the total run-time approaches a simple lower bound that decouples computation and communication, and demonstrate that coded schemes are Θ(log(n))\Theta(\log(n)) times faster than uncoded schemes

    Distributed Computations with Layered Resolution

    Full text link
    Modern computationally-heavy applications are often time-sensitive, demanding distributed strategies to accelerate them. On the other hand, distributed computing suffers from the bottleneck of slow workers in practice. Distributed coded computing is an attractive solution that adds redundancy such that a subset of distributed computations suffices to obtain the final result. However, the final result is still either obtained within a desired time or not, and for the latter, the resources that are spent are wasted. In this paper, we introduce the novel concept of layered-resolution distributed coded computations such that lower resolutions of the final result are obtained from collective results of the workers -- at an earlier stage than the final result. This innovation makes it possible to have more effective deadline-based systems, since even if a computational job is terminated because of timing, an approximated version of the final result can be released. Based on our theoretical and empirical results, the average execution delay for the first resolution is notably smaller than the one for the final resolution. Moreover, the probability of meeting a deadline is one for the first resolution in a setting where the final resolution exceeds the deadline almost all the time, reducing the success rate of the systems with no layering
    corecore