Search CORE

240 research outputs found

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Author: Bach Francis
Bubeck Sébastien
Lee Yin Tat
Massoulié Laurent
Scaman Kevin
Publication venue
Publication date: 01/06/2018
Field of study

In this work, we consider the distributed optimization of non-smooth convex functions using a network of computing units. We investigate this problem under two regularity assumptions: (1) the Lipschitz continuity of the global objective function, and (2) the Lipschitz continuity of local individual functions. Under the local regularity assumption, we provide the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate. A notable aspect of this result is that, for non-smooth functions, while the dominant term of the error is in

O(1/\sqrt{t})

, the structure of the communication network only impacts a second-order term in

O(1/t)

, where

t

is time. In other words, the error due to limits in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions. Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a

d^{1/4}

multiplicative factor of the optimal convergence rate, where

d

is the underlying dimension.Comment: 17 page

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hal-Diderot

GMRES-Accelerated ADMM for Quadratic Objectives

Author: White Jacob K.
Zhang Richard Y.
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 28/08/2018
Field of study

We consider the sequence acceleration problem for the alternating direction method-of-multipliers (ADMM) applied to a class of equality-constrained problems with strongly convex quadratic objectives, which frequently arise as the Newton subproblem of interior-point methods. Within this context, the ADMM update equations are linear, the iterates are confined within a Krylov subspace, and the General Minimum RESidual (GMRES) algorithm is optimal in its ability to accelerate convergence. The basic ADMM method solves a

\kappa

-conditioned problem in

O(\sqrt{\kappa})

iterations. We give theoretical justification and numerical evidence that the GMRES-accelerated variant consistently solves the same problem in

O(\kappa^{1/4})

iterations for an order-of-magnitude reduction in iterations, despite a worst-case bound of

O(\sqrt{\kappa})

iterations. The method is shown to be competitive against standard preconditioned Krylov subspace methods for saddle-point problems. The method is embedded within SeDuMi, a popular open-source solver for conic optimization written in MATLAB, and used to solve many large-scale semidefinite programs with error that decreases like

O(1/k^{2})

, instead of

O(1/k)

, where

k

is the iteration index.Comment: 31 pages, 7 figures. Accepted for publication in SIAM Journal on Optimization (SIOPT

arXiv.org e-Print Archive

Dual-Free Stochastic Decentralized Optimization with Variance Reduction

Author: Bach Francis
Hendrikx Hadrien
Massoulié Laurent
Publication venue
Publication date: 25/06/2020
Field of study

We consider the problem of training machine learning models on distributed data in a decentralized way. For finite-sum problems, fast single-machine algorithms for large datasets rely on stochastic updates combined with variance reduction. Yet, existing decentralized stochastic algorithms either do not obtain the full speedup allowed by stochastic updates, or require oracles that are more expensive than regular gradients. In this work, we introduce a Decentralized stochastic algorithm with Variance Reduction called DVR. DVR only requires computing stochastic gradients of the local functions, and is computationally as fast as a standard stochastic variance-reduced algorithms run on a

1/n

fraction of the dataset, where

n

is the number of nodes. To derive DVR, we use Bregman coordinate descent on a well-chosen dual problem, and obtain a dual-free algorithm using a specific Bregman divergence. We give an accelerated version of DVR based on the Catalyst framework, and illustrate its effectiveness with simulations on real data

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server