Search CORE

142 research outputs found

Universally Decodable Matrices for Distributed Matrix-Vector Multiplication

Author: Ramamoorthy Aditya
Tang Li
Vontobel Pascal O.
Publication venue
Publication date: 01/01/2019
Field of study

Coded computation is an emerging research area that leverages concepts from erasure coding to mitigate the effect of stragglers (slow nodes) in distributed computation clusters, especially for matrix computation problems. In this work, we present a class of distributed matrix-vector multiplication schemes that are based on codes in the Rosenbloom-Tsfasman metric and universally decodable matrices. Our schemes take into account the inherent computation order within a worker node. In particular, they allow us to effectively leverage partial computations performed by stragglers (a feature that many prior works lack). An additional main contribution of our work is a companion matrix-based embedding of these codes that allows us to obtain sparse and numerically stable schemes for the problem at hand. Experimental results confirm the effectiveness of our techniques.Comment: 6 pages, 1 figur

arXiv.org e-Print Archive

Digital Repository @ Iowa State University (ISU)

Crossref

Universally Decodable Matrices for Distributed Matrix-Vector Multiplication

Author: Ramamoorthy Aditya
Tang Li
Vontobel Pascal O.
Publication venue
Publication date: 01/01/2019
Field of study

arXiv.org e-Print Archive

Victoria University of Wellington

ResearchArchive at Victoria University of Wellington

Lagrange Coded Computing: Optimal Design for Resiliency, Security and Privacy

Author: Avestimehr Salman
Kalan Seyed Mohammadreza Mousavi
Li Songze
Raviv Netanel
Soltanolkotabi Mahdi
Yu Qian
Publication venue
Publication date: 01/04/2019
Field of study

We consider a scenario involving computations over a massive dataset stored distributedly across multiple workers, which is at the core of distributed learning algorithms. We propose Lagrange Coded Computing (LCC), a new framework to simultaneously provide (1) resiliency against stragglers that may prolong computations; (2) security against Byzantine (or malicious) workers that deliberately modify the computation for their benefit; and (3) (information-theoretic) privacy of the dataset amidst possible collusion of workers. LCC, which leverages the well-known Lagrange polynomial to create computation redundancy in a novel coded form across workers, can be applied to any computation scenario in which the function of interest is an arbitrary multivariate polynomial of the input dataset, hence covering many computations of interest in machine learning. LCC significantly generalizes prior works to go beyond linear computations. It also enables secure and private computing in distributed settings, improving the computation and communication efficiency of the state-of-the-art. Furthermore, we prove the optimality of LCC by showing that it achieves the optimal tradeoff between resiliency, security, and privacy, i.e., in terms of tolerating the maximum number of stragglers and adversaries, and providing data privacy against the maximum number of colluding workers. Finally, we show via experiments on Amazon EC2 that LCC speeds up the conventional uncoded implementation of distributed least-squares linear regression by up to

13.43\times

, and also achieves a

2.36\times

12.65\times

speedup over the state-of-the-art straggler mitigation strategies

arXiv.org e-Print Archive

Caltech Authors

Download and Access Trade-offs in Lagrange Coded Computing

Author: Avestimehr Salman
Bruck Jehoshua
Raviv Netanel
Yu Qian
Publication venue: 'California Institute of Technology Library'
Publication date: 10/01/2019
Field of study

Lagrange Coded Computing (LCC) is a recently proposed technique for resilient, secure, and private computation of arbitrary polynomials in distributed environments. By mapping such computations to composition of polynomials, LCC allows the master node to complete the computation by accessing a minimal number of workers and downloading all of their content, thus providing resiliency to the remaining stragglers. However, in the most common case in which the number of stragglers is less than in the worst case scenario, much of the computational power of the system remains unexploited. To amend this issue, in this paper we expand LCC by studying a fundamental trade-off between download and access, and present two contributions. In the first contribution, it is shown that without any modification to the encoding process, the master can decode the computations by accessing a larger number of nodes, however downloading less information from each node in comparison with LCC (i.e., trading access for download). This scheme relies on decoding a particular polynomial in the ideal that is generated by the polynomials of interest, a technique we call Ideal Decoding. This new scheme also improves LCC in the sense that for systems with adversaries, the overall downloaded bandwidth is smaller than in LCC. In the second contribution we study a real-time model of this trade-off, in which the data from the workers is downloaded sequentially. By clustering nodes of similar delays and encoding the function with Universally Decodable Matrices, the master can decode once sufficient data is downloaded from every cluster, regardless of the internal delays within that cluster. This allows the master to utilize the partial work that is done by stragglers, rather than to ignore it, a feature that most past works in coded computing are lacking

Crossref

Caltech Authors

Storage Codes with Flexible Number of Nodes

Author: Jafarkhani Hamid
Li Weiqi
Lu Taiting
Wang Zhiying
Publication venue
Publication date: 21/06/2021
Field of study

This paper presents flexible storage codes, a class of error-correcting codes that can recover information from a flexible number of storage nodes. As a result, one can make a better use of the available storage nodes in the presence of unpredictable node failures and reduce the data access latency. Let us assume a storage system encodes

k\ell

information symbols over a finite field

\mathbb{F}

into

n

nodes, each of size

\ell

symbols. The code is parameterized by a set of tuples

\{(R_j,k_j,\ell_j): 1 \le j \le a\}

, satisfying

k_1\ell_1=k_2\ell_2=...=k_a\ell_a

and

k_1>k_2>...>k_a = k, \ell_a=\ell

, such that the information symbols can be reconstructed from any

R_j

nodes, each node accessing

\ell_j

symbols. In other words, the code allows a flexible number of nodes for decoding to accommodate the variance in the data access time of the nodes. Code constructions are presented for different storage scenarios, including LRC (locally recoverable) codes, PMDS (partial MDS) codes, and MSR (minimum storage regenerating) codes. We analyze the latency of accessing information and perform simulations on Amazon clusters to show the efficiency of presented codes

arXiv.org e-Print Archive

eScholarship - University of California

Download and Access Trade-offs in Lagrange Coded Computing

Author: Avestimehr Salman
Bruck Jehoshua
Raviv Netanel
Yu Qian
Publication venue: 'California Institute of Technology Library'
Publication date: 10/01/2019
Field of study