Search CORE

7,035 research outputs found

Heterogeneous Coded Computation across Heterogeneous Workers

Author: Gündüz Deniz
Sun Yuxuan
Zhao Junlin
Zhou Sheng
Publication venue
Publication date: 19/05/2019
Field of study

Coded distributed computing framework enables large-scale machine learning (ML) models to be trained efficiently in a distributed manner, while mitigating the straggler effect. In this work, we consider a multi-task assignment problem in a coded distributed computing system, where multiple masters, each with a different matrix multiplication task, assign computation tasks to workers with heterogeneous computing capabilities. Both dedicated and probabilistic worker assignment models are considered, with the objective of minimizing the average completion time of all computations. For dedicated worker assignment, greedy algorithms are proposed and the corresponding optimal load allocation is derived based on the Lagrange multiplier method. For probabilistic assignment, successive convex approximation method is used to solve the non-convex optimization problem. Simulation results show that the proposed algorithms reduce the completion time by 80% over uncoded scheme, and 49% over an unbalanced coded scheme.Comment: Submitted for publicatio

arXiv.org e-Print Archive

Near-Optimal Straggler Mitigation for Distributed Gradient Methods

Author: Avestimehr A. Salman
Kalan Seyed Mohammadreza Mousavi
Li Songze
Soltanolkotabi Mahdi
Publication venue
Publication date: 27/10/2017
Field of study

Modern learning algorithms use gradient descent updates to train inferential models that best explain data. Scaling these approaches to massive data sizes requires proper distributed gradient descent schemes where distributed worker nodes compute partial gradients based on their partial and local data sets, and send the results to a master node where all the computations are aggregated into a full gradient and the learning model is updated. However, a major performance bottleneck that arises is that some of the worker nodes may run slow. These nodes a.k.a. stragglers can significantly slow down computation as the slowest node may dictate the overall computational time. We propose a distributed computing scheme, called Batched Coupon's Collector (BCC) to alleviate the effect of stragglers in gradient methods. We prove that our BCC scheme is robust to a near optimal number of random stragglers. We also empirically demonstrate that our proposed BCC scheme reduces the run-time by up to 85.4% over Amazon EC2 clusters when compared with other straggler mitigation strategies. We also generalize the proposed BCC scheme to minimize the completion time when implementing gradient descent-based algorithms over heterogeneous worker nodes

arXiv.org e-Print Archive

Combating Computational Heterogeneity in Large-Scale Distributed Computing via Work Exchange

Author: Attia Mohamed A.
Tandon Ravi
Publication venue
Publication date: 22/11/2017
Field of study

Owing to data-intensive large-scale applications, distributed computation systems have gained significant recent interest, due to their ability of running such tasks over a large number of commodity nodes in a time efficient manner. One of the major bottlenecks that adversely impacts the time efficiency is the computational heterogeneity of distributed nodes, often limiting the task completion time due to the slowest worker. In this paper, we first present a lower bound on the expected computation time based on the work-conservation principle. We then present our approach of work exchange to combat the latency problem, in which faster workers can be reassigned additional leftover computations that were originally assigned to slower workers. We present two variations of the work exchange approach: a) when the computational heterogeneity knowledge is known a priori; and b) when heterogeneity is unknown and is estimated in an online manner to assign tasks to distributed workers. As a baseline, we also present and analyze the use of an optimized Maximum Distance Separable (MDS) coded distributed computation scheme over heterogeneous nodes. Simulation results also compare the proposed approach of work exchange, the baseline MDS coded scheme and the lower bound obtained via work-conservation principle. We show that the work exchange scheme achieves time for computation which is very close to the lower bound with limited coordination and communication overhead even when the knowledge about heterogeneity levels is not available

arXiv.org e-Print Archive

Latency Analysis of Coded Computation Schemes over Wireless Networks

Author: Pedarsani Ramtin
Reisizadeh Amirhossein
Publication venue
Publication date: 30/06/2017
Field of study

Large-scale distributed computing systems face two major bottlenecks that limit their scalability: straggler delay caused by the variability of computation times at different worker nodes and communication bottlenecks caused by shuffling data across many nodes in the network. Recently, it has been shown that codes can provide significant gains in overcoming these bottlenecks. In particular, optimal coding schemes for minimizing latency in distributed computation of linear functions and mitigating the effect of stragglers was proposed for a wired network, where the workers can simultaneously transmit messages to a master node without interference. In this paper, we focus on the problem of coded computation over a wireless master-worker setup with straggling workers, where only one worker can transmit the result of its local computation back to the master at a time. We consider 3 asymptotic regimes (determined by how the communication and computation times are scaled with the number of workers) and precisely characterize the total run-time of the distributed algorithm and optimum coding strategy in each regime. In particular, for the regime of practical interest where the computation and communication times of the distributed computing algorithm are comparable, we show that the total run-time approaches a simple lower bound that decouples computation and communication, and demonstrate that coded schemes are

\Theta(\log(n))

times faster than uncoded schemes

arXiv.org e-Print Archive

Slack Squeeze Coded Computing for Adaptive Straggler Mitigation

Author: Annavaram Murali
Avestimehr Salman
Kiamari Mehrdad
Lin Zhifeng
Narra Krishna Giri
Publication venue
Publication date: 31/08/2019
Field of study

While performing distributed computations in today's cloud-based platforms, execution speed variations among compute nodes can significantly reduce the performance and create bottlenecks like stragglers. Coded computation techniques leverage coding theory to inject computational redundancy and mitigate stragglers in distributed computations. In this paper, we propose a dynamic workload distribution strategy for coded computation called Slack Squeeze Coded Computation (

S^2C^2

S^2C^2

squeezes the compute slack (i.e., overhead) that is built into the coded computing frameworks by efficiently assigning work for all fast and slow nodes according to their speeds and without needing to re-distribute data. We implement an LSTM-based speed prediction algorithm to predict speeds of compute nodes. We evaluate

S^2C^2

on linear algebraic algorithms, gradient descent, graph ranking, and graph filtering algorithms. We demonstrate 19% to 39% reduction in total computation latency using

S^2C^2

compared to job replication and coded computation. We further show how

S^2C^2

can be applied beyond matrix-vector multiplication.Comment: 13 pages, SC 201

arXiv.org e-Print Archive

Polynomial Codes: an Optimal Design for High-Dimensional Coded Matrix Multiplication

Author: Avestimehr A. Salman
Maddah-Ali Mohammad Ali
Yu Qian
Publication venue
Publication date: 24/01/2018
Field of study

We consider a large-scale matrix multiplication problem where the computation is carried out using a distributed system with a master node and multiple worker nodes, where each worker can store parts of the input matrices. We propose a computation strategy that leverages ideas from coding theory to design intermediate computations at the worker nodes, in order to efficiently deal with straggling workers. The proposed strategy, named as \emph{polynomial codes}, achieves the optimum recovery threshold, defined as the minimum number of workers that the master needs to wait for in order to compute the output. Furthermore, by leveraging the algebraic structure of polynomial codes, we can map the reconstruction problem of the final output to a polynomial interpolation problem, which can be solved efficiently. Polynomial codes provide order-wise improvement over the state of the art in terms of recovery threshold, and are also optimal in terms of several other metrics. Furthermore, we extend this code to distributed convolution and show its order-wise optimality

arXiv.org e-Print Archive

GHOST: Building blocks for high performance sparse linear algebra on heterogeneous systems

Author: Basermann Achim
Fehske Holger
Galgon Martin
Hager Georg
Kreutzer Moritz
Pieper Andreas
Röhrig-Zöllner Melven
Shahzad Faisal
Thies Jonas
Wellein Gerhard
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

While many of the architectural details of future exascale-class high performance computer systems are still a matter of intense research, there appears to be a general consensus that they will be strongly heterogeneous, featuring "standard" as well as "accelerated" resources. Today, such resources are available as multicore processors, graphics processing units (GPUs), and other accelerators such as the Intel Xeon Phi. Any software infrastructure that claims usefulness for such environments must be able to meet their inherent challenges: massive multi-level parallelism, topology, asynchronicity, and abstraction. The "General, Hybrid, and Optimized Sparse Toolkit" (GHOST) is a collection of building blocks that targets algorithms dealing with sparse matrix representations on current and future large-scale systems. It implements the "MPI+X" paradigm, has a pure C interface, and provides hybrid-parallel numerical kernels, intelligent resource management, and truly heterogeneous parallelism for multicore CPUs, Nvidia GPUs, and the Intel Xeon Phi. We describe the details of its design with respect to the challenges posed by modern heterogeneous supercomputers and recent algorithmic developments. Implementation details which are indispensable for achieving high efficiency are pointed out and their necessity is justified by performance measurements or predictions based on performance models. The library code and several applications are available as open source. We also provide instructions on how to make use of GHOST in existing software packages, together with a case study which demonstrates the applicability and performance of GHOST as a component within a larger software stack.Comment: 32 pages, 11 figure

arXiv.org e-Print Archive

A Survey of Coded Distributed Computing

Author: Asheralieva Alia
Leung Cyril
Lim Wei Yang Bryan
Luong Nguyen Cong
Miao Chunyan
Ng Jer Shyuan
Niyato Dusit
Xiong Zehui
Publication venue
Publication date: 20/08/2020
Field of study

Distributed computing has become a common approach for large-scale computation of tasks due to benefits such as high reliability, scalability, computation speed, and costeffectiveness. However, distributed computing faces critical issues related to communication load and straggler effects. In particular, computing nodes need to exchange intermediate results with each other in order to calculate the final result, and this significantly increases communication overheads. Furthermore, a distributed computing network may include straggling nodes that run intermittently slower. This results in a longer overall time needed to execute the computation tasks, thereby limiting the performance of distributed computing. To address these issues, coded distributed computing (CDC), i.e., a combination of coding theoretic techniques and distributed computing, has been recently proposed as a promising solution. Coding theoretic techniques have proved effective in WiFi and cellular systems to deal with channel noise. Therefore, CDC may significantly reduce communication load, alleviate the effects of stragglers, provide fault-tolerance, privacy and security. In this survey, we first introduce the fundamentals of CDC, followed by basic CDC schemes. Then, we review and analyze a number of CDC approaches proposed to reduce the communication costs, mitigate the straggler effects, and guarantee privacy and security. Furthermore, we present and discuss applications of CDC in modern computer networks. Finally, we highlight important challenges and promising research directions related to CD

arXiv.org e-Print Archive

Distributed Computing with Heterogeneous Communication Constraints: The Worst-Case Computation Load and Proof by Contradiction

Author: Chen Jinyuan
Li Fan
Shakya Nishant
Publication venue
Publication date: 18/08/2019
Field of study

We consider a distributed computing framework where the distributed nodes have different communication capabilities, motivated by the heterogeneous networks in data centers and mobile edge computing systems. Following the structure of MapReduce, this framework consists of Map computation phase, Shuffle phase, and Reduce computation phase. The Shuffle phase allows distributed nodes to exchange intermediate values, in the presence of heterogeneous communication bottlenecks for different nodes (heterogeneous communication load constraints). For this setting, we characterize the minimum total computation load and the minimum worst-case computation load in some cases, under the heterogeneous communication load constraints. While the total computation load depends on the sum of the computation loads of all the nodes, the worst-case computation load depends on the computation load of a node with the heaviest job. We show an interesting insight that, for some cases, there is a tradeoff between the minimum total computation load and the minimum worst-case computation load, in the sense that both cannot be achieved at the same time. The achievability schemes are proposed with careful design on the file assignment and the data shuffling. Beyond the cut-set bound, a novel converse is proposed using the proof by contradiction. For the general case, we identify two extreme regimes in which both the scheme with coding and the scheme without coding are optimal, respectively.Comment: This work was presented in part at the 52nd Annual Asilomar Conference on Signals, Systems, and Computers, October 201

arXiv.org e-Print Archive

Secure Coded Cooperative Computation at the Heterogeneous Edge against Byzantine Attacks

Author: Bitar Rawad
Dasari Venkat
Keshtkarjahromi Yasaman
Rouayheb Salim El
Seferoglu Hulya
Publication venue
Publication date: 14/08/2019
Field of study

Edge computing is emerging as a new paradigm to allow processing data at the edge of the network, where data is typically generated and collected, by exploiting multiple devices at the edge collectively. However, offloading tasks to other devices leaves the edge computing applications at the complete mercy of an attacker. One of the attacks, which is also the focus of this work, is Byzantine attacks, where one or more devices can corrupt the offloaded tasks. Furthermore, exploiting the potential of edge computing is challenging mainly due to the heterogeneous and time-varying nature of the devices at the edge. In this paper, we develop a secure coded cooperative computation mechanism (SC3) that provides both security and computation efficiency guarantees by gracefully combining homomorphic hash functions and coded cooperative computation. Homomorphic hash functions are used against Byzantine attacks and coded cooperative computation is used to improve computation efficiency when edge resources are heterogeneous and time-varying. Simulations results show that SC3 improves task completion delay significantly

arXiv.org e-Print Archive