Search CORE

15 research outputs found

Chebyshev acceleration of iterative refinement

Author: J. Scott
M. Arioli
Publication venue: Springer Nature
Publication date
Field of study

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Author: Bach Francis
Bubeck Sébastien
Lee Yin Tat
Massoulié Laurent
Scaman Kevin
Publication venue
Publication date: 01/06/2018
Field of study

In this work, we consider the distributed optimization of non-smooth convex functions using a network of computing units. We investigate this problem under two regularity assumptions: (1) the Lipschitz continuity of the global objective function, and (2) the Lipschitz continuity of local individual functions. Under the local regularity assumption, we provide the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate. A notable aspect of this result is that, for non-smooth functions, while the dominant term of the error is in

O(1/\sqrt{t})

, the structure of the communication network only impacts a second-order term in

O(1/t)

, where

t

is time. In other words, the error due to limits in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions. Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a

d^{1/4}

multiplicative factor of the optimal convergence rate, where

d

is the underlying dimension.Comment: 17 page

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hal-Diderot

On the Complexity of Finite-Sum Smooth Optimization under the Polyak-{\L}ojasiewicz Condition

Author: Bai Yunyan
Liu Yuxing
Luo Luo
Publication venue
Publication date: 04/02/2024
Field of study

This paper considers the optimization problem of the form

\min_{{\bf x}\in{\mathbb R}^d} f({\bf x})\triangleq \frac{1}{n}\sum_{i=1}^n f_i({\bf x})

, where

f(\cdot)

satisfies the Polyak--{\L}ojasiewicz (PL) condition with parameter

\mu

and

\{f_i(\cdot)\}_{i=1}^n

L

-mean-squared smooth. We show that any gradient method requires at least

\Omega(n+\kappa\sqrt{n}\log(1/\epsilon))

incremental first-order oracle (IFO) calls to find an

\epsilon

-suboptimal solution, where

\kappa\triangleq L/\mu

is the condition number of the problem. This result nearly matches upper bounds of IFO complexity for best-known first-order methods. We also study the problem of minimizing the PL function in the distributed setting such that the individuals

f_1(\cdot),\dots,f_n(\cdot)

are located on a connected network of

n

agents. We provide lower bounds of

\Omega(\kappa/\sqrt{\gamma}\,\log(1/\epsilon))

\Omega((\kappa+\tau\kappa/\sqrt{\gamma}\,)\log(1/\epsilon))

and

\Omega\big(n+\kappa\sqrt{n}\log(1/\epsilon)\big)

for communication rounds, time cost and local first-order oracle calls respectively, where

\gamma\in(0,1]

is the spectral gap of the mixing matrix associated with the network and~

\tau>0

is the time cost of per communication round. Furthermore, we propose a decentralized first-order method that nearly matches above lower bounds in expectation

arXiv.org e-Print Archive

Accelerated Gossip in Networks of Given Dimension using Jacobi Polynomial Iterations

Author: Bach Francis
Berthier Raphaël
Gaillard Pierre
Publication venue
Publication date: 15/02/2019
Field of study

Consider a network of agents connected by communication links, where each agent holds a real value. The gossip problem consists in estimating the average of the values diffused in the network in a distributed manner. We develop a method solving the gossip problem that depends only on the spectral dimension of the network, that is, in the communication network set-up, the dimension of the space in which the agents live. This contrasts with previous work that required the spectral gap of the network as a parameter, or suffered from slow mixing. Our method shows an important improvement over existing algorithms in the non-asymptotic regime, i.e., when the values are far from being fully mixed in the network. Our approach stems from a polynomial-based point of view on gossip algorithms, as well as an approximation of the spectral measure of the graphs with a Jacobi measure. We show the power of the approach with simulations on various graphs, and with performance guarantees on graphs of known spectral dimension, such as grids and random percolation bonds. An extension of this work to distributed Laplacian solvers is discussed. As a side result, we also use the polynomial-based point of view to show the convergence of the message passing algorithm for gossip of Moallemi \& Van Roy on regular graphs. The explicit computation of the rate of the convergence shows that message passing has a slow rate of convergence on graphs with small spectral gap

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Optimal Accelerated Variance Reduced EXTRA and DIGing for Strongly Convex and Smooth Decentralized Optimization

Author: Fang Yongchun
Li Huan
Lin Zhouchen
Publication venue
Publication date: 09/09/2020
Field of study

We study stochastic decentralized optimization for the problem of training machine learning models with large-scale distributed data. We extend the famous EXTRA and DIGing methods with accelerated variance reduction (VR), and propose two methods, which require the time of

O((\sqrt{n\kappa_s}+n)\log\frac{1}{\epsilon})

stochastic gradient evaluations and

O(\sqrt{\kappa_b\kappa_c}\log\frac{1}{\epsilon})

communication rounds to reach precision

\epsilon

, where

\kappa_s

and

\kappa_b

are the stochastic condition number and batch condition number for strongly convex and smooth problems,

\kappa_c

is the condition number of the communication network, and

n

is the sample size on each distributed node. Our stochastic gradient computation complexity is the same as the single-machine accelerated variance reduction methods, such as Katyusha, and our communication complexity is the same as the accelerated full batch decentralized methods, such as MSDA, and they are both optimal. We also propose the non-accelerated VR based EXTRA and DIGing, and provide explicit complexities, for example, the

O((\kappa_s+n)\log\frac{1}{\epsilon})

stochastic gradient computation complexity and the

O((\kappa_b+\kappa_c)\log\frac{1}{\epsilon})

communication complexity for the VR based EXTRA. The two complexities are also the same as the ones of single-machine VR methods, such as SAG, SAGA, and SVRG, and the non-accelerated full batch decentralized methods, such as EXTRA, respectively

arXiv.org e-Print Archive

Optimal algorithms for smooth and strongly convex distributed optimization in networks

Author: Bach Francis
Bubeck Sébastien
Lee Yin Tat
Massoulié Laurent
Scaman Kevin
Publication venue: HAL CCSD
Publication date: 28/02/2017
Field of study

In this paper, we determine the optimal convergence rates for strongly convex and smooth distributed optimization in two settings: centralized and decentralized communications over a network. For centralized (i.e. master/slave) algorithms, we show that distributing Nesterov's accelerated gradient descent is optimal and achieves a precision

\varepsilon > 0

in time

O(\sqrt{\kappa_g}(1+\Delta\tau)\ln(1/\varepsilon))

, where

\kappa_g

is the condition number of the (global) function to optimize,

\Delta

is the diameter of the network, and

\tau

(resp.

1

) is the time needed to communicate values between two neighbors (resp. perform local computations). For decentralized algorithms based on gossip, we provide the first optimal algorithm, called the multi-step dual accelerated (MSDA) method, that achieves a precision

\varepsilon > 0

in time

O(\sqrt{\kappa_l}(1+\frac{\tau}{\sqrt{\gamma}})\ln(1/\varepsilon))

, where

\kappa_l

is the condition number of the local functions and

\gamma

is the (normalized) eigengap of the gossip matrix used for communication between nodes. We then verify the efficiency of MSDA against state-of-the-art methods for two problems: least-squares regression and classification by logistic regression

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server