6 research outputs found
Distributed Learning with Sparse Communications by Identification
In distributed optimization for large-scale learning, a major performance
limitation comes from the communications between the different entities. When
computations are performed by workers on local data while a coordinator machine
coordinates their updates to minimize a global loss, we present an asynchronous
optimization algorithm that efficiently reduces the communications between the
coordinator and workers. This reduction comes from a random sparsification of
the local updates. We show that this algorithm converges linearly in the
strongly convex case and also identifies optimal strongly sparse solutions. We
further exploit this identification to propose an automatic dimension
reduction, aptly sparsifying all exchanges between coordinator and workers.Comment: v2 is a significant improvement over v1 (titled "Asynchronous
Distributed Learning with Sparse Communications and Identification") with new
algorithms, results, and discussion
CoCoA: A General Framework for Communication-Efficient Distributed Optimization
The scale of modern datasets necessitates the development of efficient
distributed optimization methods for machine learning. We present a
general-purpose framework for distributed computing environments, CoCoA, that
has an efficient communication scheme and is applicable to a wide variety of
problems in machine learning and signal processing. We extend the framework to
cover general non-strongly-convex regularizers, including L1-regularized
problems like lasso, sparse logistic regression, and elastic net
regularization, and show how earlier work can be derived as a special case. We
provide convergence guarantees for the class of convex regularized loss
minimization objectives, leveraging a novel approach in handling
non-strongly-convex regularizers and non-smooth loss functions. The resulting
framework has markedly improved performance over state-of-the-art methods, as
we illustrate with an extensive set of experiments on real distributed
datasets
Distributed Learning with Compressed Gradient Differences
Training large machine learning models requires a distributed computing
approach, with communication of the model updates being the bottleneck. For
this reason, several methods based on the compression (e.g., sparsification
and/or quantization) of updates were recently proposed, including QSGD
(Alistarh et al., 2017), TernGrad (Wen et al., 2017), SignSGD (Bernstein et
al., 2018), and DQGD (Khirirat et al., 2018). However, none of these methods
are able to learn the gradients, which renders them incapable of converging to
the true optimum in the batch mode, incompatible with non-smooth regularizers,
and slows down their convergence. In this work we propose a new distributed
learning method --- DIANA --- which resolves these issues via compression of
gradient differences. We perform a theoretical analysis in the strongly convex
and nonconvex settings and show that our rates are superior to existing rates.
Our analysis of block-quantization and differences between and
quantization closes the gaps in theory and practice. Finally, by
applying our analysis technique to TernGrad, we establish the first convergence
rate for this method.Comment: 46 page
An accelerated communication-efficient primal-dual optimization framework for structured machine learning
Distributed optimization algorithms are essential for training machine learning models on very large-scale datasets. However, they often suffer from communication bottlenecks. Confronting this issue, a communication-efficient primal-dual coordinate ascent framework (CoCoA) and its improved variant CoCoA+ have been proposed, achieving a convergence rate of for solving empirical risk minimization problems with Lipschitz continuous losses. In this paper, an accelerated variant of CoCoA+ is proposed and shown to possess a convergence rate of in terms of reducing suboptimality. The analysis of this rate is also notable in that the convergence rate bounds involve constants that, except in extreme cases, are significantly reduced compared to those previously provided for CoCoA+. The results of numerical experiments are provided to show that acceleration can lead to significant performance gains