7,035 research outputs found
Heterogeneous Coded Computation across Heterogeneous Workers
Coded distributed computing framework enables large-scale machine learning
(ML) models to be trained efficiently in a distributed manner, while mitigating
the straggler effect. In this work, we consider a multi-task assignment problem
in a coded distributed computing system, where multiple masters, each with a
different matrix multiplication task, assign computation tasks to workers with
heterogeneous computing capabilities. Both dedicated and probabilistic worker
assignment models are considered, with the objective of minimizing the average
completion time of all computations. For dedicated worker assignment, greedy
algorithms are proposed and the corresponding optimal load allocation is
derived based on the Lagrange multiplier method. For probabilistic assignment,
successive convex approximation method is used to solve the non-convex
optimization problem. Simulation results show that the proposed algorithms
reduce the completion time by 80% over uncoded scheme, and 49% over an
unbalanced coded scheme.Comment: Submitted for publicatio
Near-Optimal Straggler Mitigation for Distributed Gradient Methods
Modern learning algorithms use gradient descent updates to train inferential
models that best explain data. Scaling these approaches to massive data sizes
requires proper distributed gradient descent schemes where distributed worker
nodes compute partial gradients based on their partial and local data sets, and
send the results to a master node where all the computations are aggregated
into a full gradient and the learning model is updated. However, a major
performance bottleneck that arises is that some of the worker nodes may run
slow. These nodes a.k.a. stragglers can significantly slow down computation as
the slowest node may dictate the overall computational time. We propose a
distributed computing scheme, called Batched Coupon's Collector (BCC) to
alleviate the effect of stragglers in gradient methods. We prove that our BCC
scheme is robust to a near optimal number of random stragglers. We also
empirically demonstrate that our proposed BCC scheme reduces the run-time by up
to 85.4% over Amazon EC2 clusters when compared with other straggler mitigation
strategies. We also generalize the proposed BCC scheme to minimize the
completion time when implementing gradient descent-based algorithms over
heterogeneous worker nodes
Combating Computational Heterogeneity in Large-Scale Distributed Computing via Work Exchange
Owing to data-intensive large-scale applications, distributed computation
systems have gained significant recent interest, due to their ability of
running such tasks over a large number of commodity nodes in a time efficient
manner. One of the major bottlenecks that adversely impacts the time efficiency
is the computational heterogeneity of distributed nodes, often limiting the
task completion time due to the slowest worker.
In this paper, we first present a lower bound on the expected computation
time based on the work-conservation principle. We then present our approach of
work exchange to combat the latency problem, in which faster workers can be
reassigned additional leftover computations that were originally assigned to
slower workers. We present two variations of the work exchange approach: a)
when the computational heterogeneity knowledge is known a priori; and b) when
heterogeneity is unknown and is estimated in an online manner to assign tasks
to distributed workers. As a baseline, we also present and analyze the use of
an optimized Maximum Distance Separable (MDS) coded distributed computation
scheme over heterogeneous nodes. Simulation results also compare the proposed
approach of work exchange, the baseline MDS coded scheme and the lower bound
obtained via work-conservation principle. We show that the work exchange scheme
achieves time for computation which is very close to the lower bound with
limited coordination and communication overhead even when the knowledge about
heterogeneity levels is not available
Latency Analysis of Coded Computation Schemes over Wireless Networks
Large-scale distributed computing systems face two major bottlenecks that
limit their scalability: straggler delay caused by the variability of
computation times at different worker nodes and communication bottlenecks
caused by shuffling data across many nodes in the network. Recently, it has
been shown that codes can provide significant gains in overcoming these
bottlenecks. In particular, optimal coding schemes for minimizing latency in
distributed computation of linear functions and mitigating the effect of
stragglers was proposed for a wired network, where the workers can
simultaneously transmit messages to a master node without interference. In this
paper, we focus on the problem of coded computation over a wireless
master-worker setup with straggling workers, where only one worker can transmit
the result of its local computation back to the master at a time. We consider 3
asymptotic regimes (determined by how the communication and computation times
are scaled with the number of workers) and precisely characterize the total
run-time of the distributed algorithm and optimum coding strategy in each
regime. In particular, for the regime of practical interest where the
computation and communication times of the distributed computing algorithm are
comparable, we show that the total run-time approaches a simple lower bound
that decouples computation and communication, and demonstrate that coded
schemes are times faster than uncoded schemes
Slack Squeeze Coded Computing for Adaptive Straggler Mitigation
While performing distributed computations in today's cloud-based platforms,
execution speed variations among compute nodes can significantly reduce the
performance and create bottlenecks like stragglers. Coded computation
techniques leverage coding theory to inject computational redundancy and
mitigate stragglers in distributed computations. In this paper, we propose a
dynamic workload distribution strategy for coded computation called Slack
Squeeze Coded Computation (). squeezes the compute slack
(i.e., overhead) that is built into the coded computing frameworks by
efficiently assigning work for all fast and slow nodes according to their
speeds and without needing to re-distribute data. We implement an LSTM-based
speed prediction algorithm to predict speeds of compute nodes. We evaluate
on linear algebraic algorithms, gradient descent, graph ranking, and
graph filtering algorithms. We demonstrate 19% to 39% reduction in total
computation latency using compared to job replication and coded
computation. We further show how can be applied beyond matrix-vector
multiplication.Comment: 13 pages, SC 201
Polynomial Codes: an Optimal Design for High-Dimensional Coded Matrix Multiplication
We consider a large-scale matrix multiplication problem where the computation
is carried out using a distributed system with a master node and multiple
worker nodes, where each worker can store parts of the input matrices. We
propose a computation strategy that leverages ideas from coding theory to
design intermediate computations at the worker nodes, in order to efficiently
deal with straggling workers. The proposed strategy, named as \emph{polynomial
codes}, achieves the optimum recovery threshold, defined as the minimum number
of workers that the master needs to wait for in order to compute the output.
Furthermore, by leveraging the algebraic structure of polynomial codes, we can
map the reconstruction problem of the final output to a polynomial
interpolation problem, which can be solved efficiently. Polynomial codes
provide order-wise improvement over the state of the art in terms of recovery
threshold, and are also optimal in terms of several other metrics. Furthermore,
we extend this code to distributed convolution and show its order-wise
optimality
GHOST: Building blocks for high performance sparse linear algebra on heterogeneous systems
While many of the architectural details of future exascale-class high
performance computer systems are still a matter of intense research, there
appears to be a general consensus that they will be strongly heterogeneous,
featuring "standard" as well as "accelerated" resources. Today, such resources
are available as multicore processors, graphics processing units (GPUs), and
other accelerators such as the Intel Xeon Phi. Any software infrastructure that
claims usefulness for such environments must be able to meet their inherent
challenges: massive multi-level parallelism, topology, asynchronicity, and
abstraction. The "General, Hybrid, and Optimized Sparse Toolkit" (GHOST) is a
collection of building blocks that targets algorithms dealing with sparse
matrix representations on current and future large-scale systems. It implements
the "MPI+X" paradigm, has a pure C interface, and provides hybrid-parallel
numerical kernels, intelligent resource management, and truly heterogeneous
parallelism for multicore CPUs, Nvidia GPUs, and the Intel Xeon Phi. We
describe the details of its design with respect to the challenges posed by
modern heterogeneous supercomputers and recent algorithmic developments.
Implementation details which are indispensable for achieving high efficiency
are pointed out and their necessity is justified by performance measurements or
predictions based on performance models. The library code and several
applications are available as open source. We also provide instructions on how
to make use of GHOST in existing software packages, together with a case study
which demonstrates the applicability and performance of GHOST as a component
within a larger software stack.Comment: 32 pages, 11 figure
A Survey of Coded Distributed Computing
Distributed computing has become a common approach for large-scale
computation of tasks due to benefits such as high reliability, scalability,
computation speed, and costeffectiveness. However, distributed computing faces
critical issues related to communication load and straggler effects. In
particular, computing nodes need to exchange intermediate results with each
other in order to calculate the final result, and this significantly increases
communication overheads. Furthermore, a distributed computing network may
include straggling nodes that run intermittently slower. This results in a
longer overall time needed to execute the computation tasks, thereby limiting
the performance of distributed computing. To address these issues, coded
distributed computing (CDC), i.e., a combination of coding theoretic techniques
and distributed computing, has been recently proposed as a promising solution.
Coding theoretic techniques have proved effective in WiFi and cellular systems
to deal with channel noise. Therefore, CDC may significantly reduce
communication load, alleviate the effects of stragglers, provide
fault-tolerance, privacy and security. In this survey, we first introduce the
fundamentals of CDC, followed by basic CDC schemes. Then, we review and analyze
a number of CDC approaches proposed to reduce the communication costs, mitigate
the straggler effects, and guarantee privacy and security. Furthermore, we
present and discuss applications of CDC in modern computer networks. Finally,
we highlight important challenges and promising research directions related to
CD
Distributed Computing with Heterogeneous Communication Constraints: The Worst-Case Computation Load and Proof by Contradiction
We consider a distributed computing framework where the distributed nodes
have different communication capabilities, motivated by the heterogeneous
networks in data centers and mobile edge computing systems. Following the
structure of MapReduce, this framework consists of Map computation phase,
Shuffle phase, and Reduce computation phase. The Shuffle phase allows
distributed nodes to exchange intermediate values, in the presence of
heterogeneous communication bottlenecks for different nodes (heterogeneous
communication load constraints). For this setting, we characterize the minimum
total computation load and the minimum worst-case computation load in some
cases, under the heterogeneous communication load constraints. While the total
computation load depends on the sum of the computation loads of all the nodes,
the worst-case computation load depends on the computation load of a node with
the heaviest job. We show an interesting insight that, for some cases, there is
a tradeoff between the minimum total computation load and the minimum
worst-case computation load, in the sense that both cannot be achieved at the
same time. The achievability schemes are proposed with careful design on the
file assignment and the data shuffling. Beyond the cut-set bound, a novel
converse is proposed using the proof by contradiction. For the general case, we
identify two extreme regimes in which both the scheme with coding and the
scheme without coding are optimal, respectively.Comment: This work was presented in part at the 52nd Annual Asilomar
Conference on Signals, Systems, and Computers, October 201
Secure Coded Cooperative Computation at the Heterogeneous Edge against Byzantine Attacks
Edge computing is emerging as a new paradigm to allow processing data at the
edge of the network, where data is typically generated and collected, by
exploiting multiple devices at the edge collectively. However, offloading tasks
to other devices leaves the edge computing applications at the complete mercy
of an attacker. One of the attacks, which is also the focus of this work, is
Byzantine attacks, where one or more devices can corrupt the offloaded tasks.
Furthermore, exploiting the potential of edge computing is challenging mainly
due to the heterogeneous and time-varying nature of the devices at the edge. In
this paper, we develop a secure coded cooperative computation mechanism (SC3)
that provides both security and computation efficiency guarantees by gracefully
combining homomorphic hash functions and coded cooperative computation.
Homomorphic hash functions are used against Byzantine attacks and coded
cooperative computation is used to improve computation efficiency when edge
resources are heterogeneous and time-varying. Simulations results show that SC3
improves task completion delay significantly
- …