Search CORE

61,008 research outputs found

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Author: Bach Francis
Bubeck Sébastien
Lee Yin Tat
Massoulié Laurent
Scaman Kevin
Publication venue
Publication date: 01/06/2018
Field of study

In this work, we consider the distributed optimization of non-smooth convex functions using a network of computing units. We investigate this problem under two regularity assumptions: (1) the Lipschitz continuity of the global objective function, and (2) the Lipschitz continuity of local individual functions. Under the local regularity assumption, we provide the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate. A notable aspect of this result is that, for non-smooth functions, while the dominant term of the error is in

O(1/\sqrt{t})

, the structure of the communication network only impacts a second-order term in

O(1/t)

, where

t

is time. In other words, the error due to limits in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions. Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a

d^{1/4}

multiplicative factor of the optimal convergence rate, where

d

is the underlying dimension.Comment: 17 page

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hal-Diderot

Accelerated $AB$ /Push-Pull Methods for Distributed Optimization over Time-Varying Directed Networks

Author: Nedich Angelia
Nguyen Duong Thuy Anh
Nguyen Duong Tung
Publication venue
Publication date: 21/06/2023
Field of study

This paper investigates a novel approach for solving the distributed optimization problem in which multiple agents collaborate to find the global decision that minimizes the sum of their individual cost functions. First, the

AB

/Push-Pull gradient-based algorithm is considered, which employs row- and column-stochastic weights simultaneously to track the optimal decision and the gradient of the global cost function, ensuring consensus on the optimal decision. Building on this algorithm, we then develop a general algorithm that incorporates acceleration techniques, such as heavy-ball momentum and Nesterov momentum, as well as their combination with non-identical momentum parameters. Previous literature has established the effectiveness of acceleration methods for various gradient-based distributed algorithms and demonstrated linear convergence for static directed communication networks. In contrast, we focus on time-varying directed communication networks and establish linear convergence of the methods to the optimal solution, when the agents' cost functions are smooth and strongly convex. Additionally, we provide explicit bounds for the step-size value and momentum parameters, based on the properties of the cost functions, the mixing matrices, and the graph connectivity structures. Our numerical results illustrate the benefits of the proposed acceleration techniques on the

AB

/Push-Pull algorithm

arXiv.org e-Print Archive

Multi-consensus Decentralized Accelerated Gradient Descent

Author: Luo Luo
Ye Haishan
Zhang Tong
Zhou Ziang
Publication venue
Publication date: 02/05/2020
Field of study

This paper considers the decentralized optimization problem, which has applications in large scale machine learning, sensor networks, and control theory. We propose a novel algorithm that can achieve near optimal communication complexity, matching the known lower bound up to a logarithmic factor of the condition number of the problem. Our theoretical results give affirmative answers to the open problem on whether there exists an algorithm that can achieve a communication complexity (nearly) matching the lower bound depending on the global condition number instead of the local one. Moreover, the proposed algorithm achieves the optimal computation complexity matching the lower bound up to universal constants. Furthermore, to achieve a linear convergence rate, our algorithm \emph{doesn't} require the individual functions to be (strongly) convex. Our method relies on a novel combination of known techniques including Nesterov's accelerated gradient descent, multi-consensus and gradient-tracking. The analysis is new, and may be applied to other related problems. Empirical studies demonstrate the effectiveness of our method for machine learning applications

arXiv.org e-Print Archive