61,008 research outputs found
Optimal Algorithms for Non-Smooth Distributed Optimization in Networks
In this work, we consider the distributed optimization of non-smooth convex
functions using a network of computing units. We investigate this problem under
two regularity assumptions: (1) the Lipschitz continuity of the global
objective function, and (2) the Lipschitz continuity of local individual
functions. Under the local regularity assumption, we provide the first optimal
first-order decentralized algorithm called multi-step primal-dual (MSPD) and
its corresponding optimal convergence rate. A notable aspect of this result is
that, for non-smooth functions, while the dominant term of the error is in
, the structure of the communication network only impacts a
second-order term in , where is time. In other words, the error due
to limits in communication resources decreases at a fast rate even in the case
of non-strongly-convex objective functions. Under the global regularity
assumption, we provide a simple yet efficient algorithm called distributed
randomized smoothing (DRS) based on a local smoothing of the objective
function, and show that DRS is within a multiplicative factor of the
optimal convergence rate, where is the underlying dimension.Comment: 17 page
Accelerated /Push-Pull Methods for Distributed Optimization over Time-Varying Directed Networks
This paper investigates a novel approach for solving the distributed
optimization problem in which multiple agents collaborate to find the global
decision that minimizes the sum of their individual cost functions. First, the
/Push-Pull gradient-based algorithm is considered, which employs row- and
column-stochastic weights simultaneously to track the optimal decision and the
gradient of the global cost function, ensuring consensus on the optimal
decision. Building on this algorithm, we then develop a general algorithm that
incorporates acceleration techniques, such as heavy-ball momentum and Nesterov
momentum, as well as their combination with non-identical momentum parameters.
Previous literature has established the effectiveness of acceleration methods
for various gradient-based distributed algorithms and demonstrated linear
convergence for static directed communication networks. In contrast, we focus
on time-varying directed communication networks and establish linear
convergence of the methods to the optimal solution, when the agents' cost
functions are smooth and strongly convex. Additionally, we provide explicit
bounds for the step-size value and momentum parameters, based on the properties
of the cost functions, the mixing matrices, and the graph connectivity
structures. Our numerical results illustrate the benefits of the proposed
acceleration techniques on the /Push-Pull algorithm
Multi-consensus Decentralized Accelerated Gradient Descent
This paper considers the decentralized optimization problem, which has
applications in large scale machine learning, sensor networks, and control
theory. We propose a novel algorithm that can achieve near optimal
communication complexity, matching the known lower bound up to a logarithmic
factor of the condition number of the problem. Our theoretical results give
affirmative answers to the open problem on whether there exists an algorithm
that can achieve a communication complexity (nearly) matching the lower bound
depending on the global condition number instead of the local one. Moreover,
the proposed algorithm achieves the optimal computation complexity matching the
lower bound up to universal constants. Furthermore, to achieve a linear
convergence rate, our algorithm \emph{doesn't} require the individual functions
to be (strongly) convex. Our method relies on a novel combination of known
techniques including Nesterov's accelerated gradient descent, multi-consensus
and gradient-tracking. The analysis is new, and may be applied to other related
problems. Empirical studies demonstrate the effectiveness of our method for
machine learning applications
- …