4 research outputs found
Recommended from our members
On structured and distributed learning
With the growth in size and complexity of data, methods exploiting low-dimensional structure, as well as distributed methods, have been playing an ever important role in machine learning. These approaches offer a natural choice to alleviate the computational burden, albeit typically at a statistical trade-off. In this thesis, we show that a careful utilization of structure of a problem, or bottlenecks of a distributed system, can also provide a statistical advantage in such settings. We do this from the purview of the following three problems: 1. Learning Graphical models with a few hubs: Graphical models are a popular tool to represent multivariate distributions. The task of learning a graphical model entails estimating the graph of conditional dependencies between variables. Existing approaches to learn graphical models require a number of samples polynomial in the maximum degree of the true graph, which can be large even if there are a few high-degree nodes. In this part of the thesis, we propose an estimator that detects and then ignores high degree nodes. Consequently, we show that such an estimator has a lower sample complexity requirement for learning the overall graph when the true graph has a few high-degree nodes or "hubs" for e.g. scale-free graphs. 2. Kernel Ridge Regression via partitioning: Kernel methods find wide and varied applicability in machine learning. However, solving the Kernel Ridge Regression (KRR) optimization requires computation that is cubic in the number of samples. In this work, we consider a divide-and-conquer approach to solve the KRR problem. The division step involves splitting the samples based on a partitioning of the input space, and the conquering step is to simply use the local KRR estimate in each partition. We show that this can not only lower the computational requirements of solving the KRR problem, but also lead to improved accuracy over both a single KRR estimate, and estimates based on random data partitioning. 3. Stragglers in Distributed Synchronous Gradient Descent: Synchronous methods in machine learning have many desirable properties, but they are only as fast as the slowest machine in a distributed system. The straggler/slow machine problem is a critical bottleneck for such methods. In this part of our work, we propose a novel framework based on Coding Theory for mitigating stragglers in Distributed Synchronous Gradient Descent (and its variants). Our approach views stragglers as errors/erasures. By carefully replicating data blocks and coding across gradients, we show how this can provide tolerance to failures and stragglers without incurring any communication overheads.Computer Science
Gradient Coding from Cyclic MDS Codes and Expander Graphs
Gradient coding is a technique for straggler mitigation in distributed
learning. In this paper we design novel gradient codes using tools from
classical coding theory, namely, cyclic MDS codes, which compare favorably with
existing solutions, both in the applicable range of parameters and in the
complexity of the involved algorithms. Second, we introduce an approximate
variant of the gradient coding problem, in which we settle for approximate
gradient computation instead of the exact one. This approach enables graceful
degradation, i.e., the error of the approximate gradient is a
decreasing function of the number of stragglers. Our main result is that
normalized adjacency matrices of expander graphs yield excellent approximate
gradient codes, which enable significantly less computation compared to exact
gradient coding, and guarantee faster convergence than trivial solutions under
standard assumptions. We experimentally test our approach on Amazon EC2, and
show that the generalization error of approximate gradient coding is very close
to the full gradient while requiring significantly less computation from the
workers
Timely-Throughput Optimal Coded Computing over Cloud Networks
In modern distributed computing systems, unpredictable and unreliable
infrastructures result in high variability of computing resources. Meanwhile,
there is significantly increasing demand for timely and event-driven services
with deadline constraints. Motivated by measurements over Amazon EC2 clusters,
we consider a two-state Markov model for variability of computing speed in
cloud networks. In this model, each worker can be either in a good state or a
bad state in terms of the computation speed, and the transition between these
states is modeled as a Markov chain which is unknown to the scheduler. We then
consider a Coded Computing framework, in which the data is possibly encoded and
stored at the worker nodes in order to provide robustness against nodes that
may be in a bad state. With timely computation requests submitted to the system
with computation deadlines, our goal is to design the optimal computation-load
allocation scheme and the optimal data encoding scheme that maximize the timely
computation throughput (i.e, the average number of computation tasks that are
accomplished before their deadline). Our main result is the development of a
dynamic computation strategy called Lagrange Estimate-and Allocate (LEA)
strategy, which achieves the optimal timely computation throughput. It is shown
that compared to the static allocation strategy, LEA increases the timely
computation throughput by 1.4X - 17.5X in various scenarios via simulations and
by 1.27X - 6.5X in experiments over Amazon EC2 clustersComment: to appear in MobiHoc 201