142 research outputs found
Universally Decodable Matrices for Distributed Matrix-Vector Multiplication
Coded computation is an emerging research area that leverages concepts from
erasure coding to mitigate the effect of stragglers (slow nodes) in distributed
computation clusters, especially for matrix computation problems. In this work,
we present a class of distributed matrix-vector multiplication schemes that are
based on codes in the Rosenbloom-Tsfasman metric and universally decodable
matrices. Our schemes take into account the inherent computation order within a
worker node. In particular, they allow us to effectively leverage partial
computations performed by stragglers (a feature that many prior works lack). An
additional main contribution of our work is a companion matrix-based embedding
of these codes that allows us to obtain sparse and numerically stable schemes
for the problem at hand. Experimental results confirm the effectiveness of our
techniques.Comment: 6 pages, 1 figur
Universally Decodable Matrices for Distributed Matrix-Vector Multiplication
Coded computation is an emerging research area that leverages concepts from
erasure coding to mitigate the effect of stragglers (slow nodes) in distributed
computation clusters, especially for matrix computation problems. In this work,
we present a class of distributed matrix-vector multiplication schemes that are
based on codes in the Rosenbloom-Tsfasman metric and universally decodable
matrices. Our schemes take into account the inherent computation order within a
worker node. In particular, they allow us to effectively leverage partial
computations performed by stragglers (a feature that many prior works lack). An
additional main contribution of our work is a companion matrix-based embedding
of these codes that allows us to obtain sparse and numerically stable schemes
for the problem at hand. Experimental results confirm the effectiveness of our
techniques.Comment: 6 pages, 1 figur
Lagrange Coded Computing: Optimal Design for Resiliency, Security and Privacy
We consider a scenario involving computations over a massive dataset stored
distributedly across multiple workers, which is at the core of distributed
learning algorithms. We propose Lagrange Coded Computing (LCC), a new framework
to simultaneously provide (1) resiliency against stragglers that may prolong
computations; (2) security against Byzantine (or malicious) workers that
deliberately modify the computation for their benefit; and (3)
(information-theoretic) privacy of the dataset amidst possible collusion of
workers. LCC, which leverages the well-known Lagrange polynomial to create
computation redundancy in a novel coded form across workers, can be applied to
any computation scenario in which the function of interest is an arbitrary
multivariate polynomial of the input dataset, hence covering many computations
of interest in machine learning. LCC significantly generalizes prior works to
go beyond linear computations. It also enables secure and private computing in
distributed settings, improving the computation and communication efficiency of
the state-of-the-art. Furthermore, we prove the optimality of LCC by showing
that it achieves the optimal tradeoff between resiliency, security, and
privacy, i.e., in terms of tolerating the maximum number of stragglers and
adversaries, and providing data privacy against the maximum number of colluding
workers. Finally, we show via experiments on Amazon EC2 that LCC speeds up the
conventional uncoded implementation of distributed least-squares linear
regression by up to , and also achieves a
- speedup over the state-of-the-art straggler
mitigation strategies
Download and Access Trade-offs in Lagrange Coded Computing
Lagrange Coded Computing (LCC) is a recently
proposed technique for resilient, secure, and private computation
of arbitrary polynomials in distributed environments. By
mapping such computations to composition of polynomials, LCC
allows the master node to complete the computation by accessing
a minimal number of workers and downloading all of their
content, thus providing resiliency to the remaining stragglers.
However, in the most common case in which the number of
stragglers is less than in the worst case scenario, much of the
computational power of the system remains unexploited. To
amend this issue, in this paper we expand LCC by studying a
fundamental trade-off between download and access, and present
two contributions. In the first contribution, it is shown that
without any modification to the encoding process, the master
can decode the computations by accessing a larger number of
nodes, however downloading less information from each node in
comparison with LCC (i.e., trading access for download). This
scheme relies on decoding a particular polynomial in the ideal
that is generated by the polynomials of interest, a technique we
call Ideal Decoding. This new scheme also improves LCC in the
sense that for systems with adversaries, the overall downloaded
bandwidth is smaller than in LCC. In the second contribution
we study a real-time model of this trade-off, in which the data
from the workers is downloaded sequentially. By clustering nodes
of similar delays and encoding the function with Universally
Decodable Matrices, the master can decode once sufficient data is
downloaded from every cluster, regardless of the internal delays
within that cluster. This allows the master to utilize the partial
work that is done by stragglers, rather than to ignore it, a feature
that most past works in coded computing are lacking
Storage Codes with Flexible Number of Nodes
This paper presents flexible storage codes, a class of error-correcting codes
that can recover information from a flexible number of storage nodes. As a
result, one can make a better use of the available storage nodes in the
presence of unpredictable node failures and reduce the data access latency. Let
us assume a storage system encodes information symbols over a finite
field into nodes, each of size symbols. The code is
parameterized by a set of tuples ,
satisfying and , such that the information symbols can be reconstructed from any
nodes, each node accessing symbols. In other words, the code
allows a flexible number of nodes for decoding to accommodate the variance in
the data access time of the nodes. Code constructions are presented for
different storage scenarios, including LRC (locally recoverable) codes, PMDS
(partial MDS) codes, and MSR (minimum storage regenerating) codes. We analyze
the latency of accessing information and perform simulations on Amazon clusters
to show the efficiency of presented codes
Download and Access Trade-offs in Lagrange Coded Computing
Lagrange Coded Computing (LCC) is a recently
proposed technique for resilient, secure, and private computation
of arbitrary polynomials in distributed environments. By
mapping such computations to composition of polynomials, LCC
allows the master node to complete the computation by accessing
a minimal number of workers and downloading all of their
content, thus providing resiliency to the remaining stragglers.
However, in the most common case in which the number of
stragglers is less than in the worst case scenario, much of the
computational power of the system remains unexploited. To
amend this issue, in this paper we expand LCC by studying a
fundamental trade-off between download and access, and present
two contributions. In the first contribution, it is shown that
without any modification to the encoding process, the master
can decode the computations by accessing a larger number of
nodes, however downloading less information from each node in
comparison with LCC (i.e., trading access for download). This
scheme relies on decoding a particular polynomial in the ideal
that is generated by the polynomials of interest, a technique we
call Ideal Decoding. This new scheme also improves LCC in the
sense that for systems with adversaries, the overall downloaded
bandwidth is smaller than in LCC. In the second contribution
we study a real-time model of this trade-off, in which the data
from the workers is downloaded sequentially. By clustering nodes
of similar delays and encoding the function with Universally
Decodable Matrices, the master can decode once sufficient data is
downloaded from every cluster, regardless of the internal delays
within that cluster. This allows the master to utilize the partial
work that is done by stragglers, rather than to ignore it, a feature
that most past works in coded computing are lacking
- β¦