Search CORE

4 research outputs found

Recommended from our members

On structured and distributed learning

Author: Tandon Rashish
Publication venue
Publication date: 28/08/2018
Field of study

With the growth in size and complexity of data, methods exploiting low-dimensional structure, as well as distributed methods, have been playing an ever important role in machine learning. These approaches offer a natural choice to alleviate the computational burden, albeit typically at a statistical trade-off. In this thesis, we show that a careful utilization of structure of a problem, or bottlenecks of a distributed system, can also provide a statistical advantage in such settings. We do this from the purview of the following three problems: 1. Learning Graphical models with a few hubs: Graphical models are a popular tool to represent multivariate distributions. The task of learning a graphical model entails estimating the graph of conditional dependencies between variables. Existing approaches to learn graphical models require a number of samples polynomial in the maximum degree of the true graph, which can be large even if there are a few high-degree nodes. In this part of the thesis, we propose an estimator that detects and then ignores high degree nodes. Consequently, we show that such an estimator has a lower sample complexity requirement for learning the overall graph when the true graph has a few high-degree nodes or "hubs" for e.g. scale-free graphs. 2. Kernel Ridge Regression via partitioning: Kernel methods find wide and varied applicability in machine learning. However, solving the Kernel Ridge Regression (KRR) optimization requires computation that is cubic in the number of samples. In this work, we consider a divide-and-conquer approach to solve the KRR problem. The division step involves splitting the samples based on a partitioning of the input space, and the conquering step is to simply use the local KRR estimate in each partition. We show that this can not only lower the computational requirements of solving the KRR problem, but also lead to improved accuracy over both a single KRR estimate, and estimates based on random data partitioning. 3. Stragglers in Distributed Synchronous Gradient Descent: Synchronous methods in machine learning have many desirable properties, but they are only as fast as the slowest machine in a distributed system. The straggler/slow machine problem is a critical bottleneck for such methods. In this part of our work, we propose a novel framework based on Coding Theory for mitigating stragglers in Distributed Synchronous Gradient Descent (and its variants). Our approach views stragglers as errors/erasures. By carefully replicating data blocks and coding across gradients, we show how this can provide tolerance to failures and stragglers without incurring any communication overheads.Computer Science

Texas ScholarWorks

Gradient Coding from Cyclic MDS Codes and Expander Graphs

Author: Dimakis Alexandros G.
Raviv Netanel
Tamo Itzhak
Tandon Rashish
Publication venue
Publication date: 01/07/2018
Field of study

Gradient coding is a technique for straggler mitigation in distributed learning. In this paper we design novel gradient codes using tools from classical coding theory, namely, cyclic MDS codes, which compare favorably with existing solutions, both in the applicable range of parameters and in the complexity of the involved algorithms. Second, we introduce an approximate variant of the gradient coding problem, in which we settle for approximate gradient computation instead of the exact one. This approach enables graceful degradation, i.e., the

\ell_2

error of the approximate gradient is a decreasing function of the number of stragglers. Our main result is that normalized adjacency matrices of expander graphs yield excellent approximate gradient codes, which enable significantly less computation compared to exact gradient coding, and guarantee faster convergence than trivial solutions under standard assumptions. We experimentally test our approach on Amazon EC2, and show that the generalization error of approximate gradient coding is very close to the full gradient while requiring significantly less computation from the workers

arXiv.org e-Print Archive

Timely-Throughput Optimal Coded Computing over Cloud Networks

Author: Ananthanarayanan Ganesh
Chen Lingjiao
Dutta Sanghamitra
Hou I.
Tandon Rashish
Zaharia Matei
Publication venue
Publication date: 11/04/2019
Field of study

In modern distributed computing systems, unpredictable and unreliable infrastructures result in high variability of computing resources. Meanwhile, there is significantly increasing demand for timely and event-driven services with deadline constraints. Motivated by measurements over Amazon EC2 clusters, we consider a two-state Markov model for variability of computing speed in cloud networks. In this model, each worker can be either in a good state or a bad state in terms of the computation speed, and the transition between these states is modeled as a Markov chain which is unknown to the scheduler. We then consider a Coded Computing framework, in which the data is possibly encoded and stored at the worker nodes in order to provide robustness against nodes that may be in a bad state. With timely computation requests submitted to the system with computation deadlines, our goal is to design the optimal computation-load allocation scheme and the optimal data encoding scheme that maximize the timely computation throughput (i.e, the average number of computation tasks that are accomplished before their deadline). Our main result is the development of a dynamic computation strategy called Lagrange Estimate-and Allocate (LEA) strategy, which achieves the optimal timely computation throughput. It is shown that compared to the static allocation strategy, LEA increases the timely computation throughput by 1.4X - 17.5X in various scenarios via simulations and by 1.27X - 6.5X in experiments over Amazon EC2 clustersComment: to appear in MobiHoc 201

arXiv.org e-Print Archive

Crossref

eScholarship - University of California