Search CORE

6 research outputs found

Distributed Learning with Sparse Communications by Identification

Author: Amini Massih-Reza
Grishchenko Dmitry
Iutzeler Franck
Malick Jérôme
Publication venue
Publication date: 25/06/2020
Field of study

In distributed optimization for large-scale learning, a major performance limitation comes from the communications between the different entities. When computations are performed by workers on local data while a coordinator machine coordinates their updates to minimize a global loss, we present an asynchronous optimization algorithm that efficiently reduces the communications between the coordinator and workers. This reduction comes from a random sparsification of the local updates. We show that this algorithm converges linearly in the strongly convex case and also identifies optimal strongly sparse solutions. We further exploit this identification to propose an automatic dimension reduction, aptly sparsifying all exchanges between coordinator and workers.Comment: v2 is a significant improvement over v1 (titled "Asynchronous Distributed Learning with Sparse Communications and Identification") with new algorithms, results, and discussion

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Hal-Diderot

CoCoA: A General Framework for Communication-Efficient Distributed Optimization

Author: Forte Simone
Jaggi Martin
Jordan Michael I.
Ma Chenxin
Smith Virginia
Takac Martin
Publication venue
Publication date: 21/06/2017
Field of study

The scale of modern datasets necessitates the development of efficient distributed optimization methods for machine learning. We present a general-purpose framework for distributed computing environments, CoCoA, that has an efficient communication scheme and is applicable to a wide variety of problems in machine learning and signal processing. We extend the framework to cover general non-strongly-convex regularizers, including L1-regularized problems like lasso, sparse logistic regression, and elastic net regularization, and show how earlier work can be derived as a special case. We provide convergence guarantees for the class of convex regularized loss minimization objectives, leveraging a novel approach in handling non-strongly-convex regularizers and non-smooth loss functions. The resulting framework has markedly improved performance over state-of-the-art methods, as we illustrate with an extensive set of experiments on real distributed datasets

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Repository for Publications and Research Data

Distributed Learning with Compressed Gradient Differences

Author: Gorbunov Eduard
Mishchenko Konstantin
Richtárik Peter
Takáč Martin
Publication venue
Publication date: 02/06/2019
Field of study

Training large machine learning models requires a distributed computing approach, with communication of the model updates being the bottleneck. For this reason, several methods based on the compression (e.g., sparsification and/or quantization) of updates were recently proposed, including QSGD (Alistarh et al., 2017), TernGrad (Wen et al., 2017), SignSGD (Bernstein et al., 2018), and DQGD (Khirirat et al., 2018). However, none of these methods are able to learn the gradients, which renders them incapable of converging to the true optimum in the batch mode, incompatible with non-smooth regularizers, and slows down their convergence. In this work we propose a new distributed learning method --- DIANA --- which resolves these issues via compression of gradient differences. We perform a theoretical analysis in the strongly convex and nonconvex settings and show that our rates are superior to existing rates. Our analysis of block-quantization and differences between

\ell_2

and

\ell_\infty

quantization closes the gaps in theory and practice. Finally, by applying our analysis technique to TernGrad, we establish the first convergence rate for this method.Comment: 46 page

arXiv.org e-Print Archive

An accelerated communication-efficient primal-dual optimization framework for structured machine learning

Author: Curtis Frank E.
Jaggi Martin
Ma Chenxin
Srebro Nathan
Takac Martin
Publication venue: 'Informa UK Limited'
Publication date: 25/08/2019
Field of study

Distributed optimization algorithms are essential for training machine learning models on very large-scale datasets. However, they often suffer from communication bottlenecks. Confronting this issue, a communication-efficient primal-dual coordinate ascent framework (CoCoA) and its improved variant CoCoA+ have been proposed, achieving a convergence rate of for solving empirical risk minimization problems with Lipschitz continuous losses. In this paper, an accelerated variant of CoCoA+ is proposed and shown to possess a convergence rate of in terms of reducing suboptimality. The analysis of this rate is also notable in that the convergence rate bounds involve constants that, except in extreme cases, are significantly reduced compared to those previously provided for CoCoA+. The results of numerical experiments are provided to show that acceleration can lead to significant performance gains

Infoscience - École polytechnique fédérale de Lausanne