18 research outputs found
Local Exact-Diffusion for Decentralized Optimization and Learning
Distributed optimization methods with local updates have recently attracted a
lot of attention due to their potential to reduce the communication cost of
distributed methods. In these algorithms, a collection of nodes performs
several local updates based on their local data, and then they communicate with
each other to exchange estimate information. While there have been many studies
on distributed local methods with centralized network connections, there has
been less work on decentralized networks.
In this work, we propose and investigate a locally updated decentralized
method called Local Exact-Diffusion (LED). We establish the convergence of LED
in both convex and nonconvex settings for the stochastic online setting. Our
convergence rate improves over the rate of existing decentralized methods. When
we specialize the network to the centralized case, we recover the
state-of-the-art bound for centralized methods. We also link LED to several
other independently studied distributed methods, including Scaffnew, FedGate,
and VRL-SGD. Additionally, we numerically investigate the benefits of local
updates for decentralized networks and demonstrate the effectiveness of the
proposed method
Linear Convergence of Primal-Dual Gradient Methods and their Performance in Distributed Optimization
In this work, we revisit a classical incremental implementation of the
primal-descent dual-ascent gradient method used for the solution of equality
constrained optimization problems. We provide a short proof that establishes
the linear (exponential) convergence of the algorithm for smooth
strongly-convex cost functions and study its relation to the non-incremental
implementation. We also study the effect of the augmented Lagrangian penalty
term on the performance of distributed optimization algorithms for the
minimization of aggregate cost functions over multi-agent networks
Distributed Coupled Multi-Agent Stochastic Optimization
This work develops effective distributed strategies for the solution of
constrained multi-agent stochastic optimization problems with coupled
parameters across the agents. In this formulation, each agent is influenced by
only a subset of the entries of a global parameter vector or model, and is
subject to convex constraints that are only known locally. Problems of this
type arise in several applications, most notably in disease propagation models,
minimum-cost flow problems, distributed control formulations, and distributed
power system monitoring. This work focuses on stochastic settings, where a
stochastic risk function is associated with each agent and the objective is to
seek the minimizer of the aggregate sum of all risks subject to a set of
constraints. Agents are not aware of the statistical distribution of the data
and, therefore, can only rely on stochastic approximations in their learning
strategies. We derive an effective distributed learning strategy that is able
to track drifts in the underlying parameter model. A detailed performance and
stability analysis is carried out showing that the resulting coupled diffusion
strategy converges at a linear rate to an neighborhood of the true
penalized optimizer
Recommended from our members
On the Performance and Linear Convergence of Decentralized Primal-Dual Methods
This dissertation studies the performance and linear convergence properties of primal-dual methods for the solution of decentralized multi-agent optimization problems. Decentralized multi-agent optimization is a powerful paradigm that finds applications in diverse fields in learning and engineering design. In these setups, a network of agents is connected through some topology and agents are allowed to share information only locally. Their overall goal is to seek the minimizer of a global optimization problem through localized interactions. In decentralized consensus problems, the agents are coupled through a common consensus variable that they need to agree upon. While in decentralized resource allocation problems, the agents are coupled through global affine constraints. Various decentralized consensus optimization algorithms already exist in the literature. Some methods are derived from a primal-dual perspective, while other methods are derived as gradient tracking mechanisms meant to track the average of local gradients. Among the gradient tracking methods are the adapt-then-combine implementations motivated by diffusion strategies, which have been observed to perform better than other implementations. In this dissertation, we develop a novel adapt-then-combine primal-dual algorithmic framework that captures most state-of-the-art gradient based methods as special cases including all the variations of the gradient-tracking methods. We also develop a concise and novel analysis technique that establishes the linear convergence of this general framework under strongly-convex objectives. Due to our unified framework, the analysis reveals important characteristics for these methods such as their convergence rates and step-size stability ranges. Moreover, the analysis reveals how the augmented Lagrangian penalty term, which is utilized in most of these methods, affects the performance of decentralized algorithms. Another important question that we answer is whether decentralized proximal gradient methods can achieve global linear convergence for non-smooth composite optimization. For centralized algorithms, linear convergence has been established in the presence of a non-smooth composite term. In this dissertation, we close the gap between centralized and decentralized proximal gradient algorithms and show that decentralized proximal algorithms can also achieve linear convergence in the presence of a non-smooth term. Furthermore, we show that when each agent possesses a different local non-smooth term then global linear convergence cannot be established in the worst case. Most works that study decentralized optimization problems assume that all agents are involved in computing all variables. However, in many applications the coupling across agents is sparse in the sense that only a few agents are involved in computing certain variables. We show how to design decentralized algorithms in sparsely coupled consensus and resource allocation problems. More importantly, we establish analytically the importance of exploiting the sparsity structure in coupled large-scale networks
A Linearly Convergent Proximal Gradient Algorithm for Decentralized Optimization
Decentralized optimization is a powerful paradigm that finds applications in
engineering and learning design. This work studies decentralized composite
optimization problems with non-smooth regularization terms. Most existing
gradient-based proximal decentralized methods are known to converge to the
optimal solution with sublinear rates, and it remains unclear whether this
family of methods can achieve global linear convergence. To tackle this
problem, this work assumes the non-smooth regularization term is common across
all networked agents, which is the case for many machine learning problems.
Under this condition, we design a proximal gradient decentralized algorithm
whose fixed point coincides with the desired minimizer. We then provide a
concise proof that establishes its linear convergence. In the absence of the
non-smooth term, our analysis technique covers the well known EXTRA algorithm
and provides useful bounds on the convergence rate and step-size.Comment: NeurIPS 201
A Proximal Diffusion Strategy for Multi-Agent Optimization with Sparse Affine Constraints
This work develops a proximal primal-dual decentralized strategy for
multi-agent optimization problems that involve multiple coupled affine
constraints, where each constraint may involve only a subset of the agents. The
constraints are generally sparse, meaning that only a small subset of the
agents are involved in them. This scenario arises in many applications
including decentralized control formulations, resource allocation problems, and
smart grids. Traditional decentralized solutions tend to ignore the structure
of the constraints and lead to degraded performance. We instead develop a
decentralized solution that exploits the sparsity structure. Under constant
step-size learning, the asymptotic convergence of the proposed algorithm is
established in the presence of non-smooth terms, and it occurs at a linear rate
in the smooth case. We also examine how the performance of the algorithm is
influenced by the sparsity of the constraints. Simulations illustrate the
superior performance of the proposed strategy.Comment: accepted for publication in IEEE TA
Diffusion Stochastic Optimization for Min-Max Problems
The optimistic gradient method is useful in addressing minimax optimization
problems. Motivated by the observation that the conventional stochastic version
suffers from the need for a large batch size on the order of
to achieve an -stationary
solution, we introduce and analyze a new formulation termed Diffusion
Stochastic Same-Sample Optimistic Gradient (DSS-OG). We prove its convergence
and resolve the large batch issue by establishing a tighter upper bound, under
the more general setting of nonconvex Polyak-Lojasiewicz (PL) risk functions.
We also extend the applicability of the proposed method to the distributed
scenario, where agents communicate with their neighbors via a left-stochastic
protocol. To implement DSS-OG, we can query the stochastic gradient oracles in
parallel with some extra memory overhead, resulting in a complexity comparable
to its conventional counterpart. To demonstrate the efficacy of the proposed
algorithm, we conduct tests by training generative adversarial networks
On the Influence of Bias-Correction on Distributed Stochastic Optimization
Various bias-correction methods such as EXTRA, gradient tracking methods, and
exact diffusion have been proposed recently to solve distributed {\em
deterministic} optimization problems. These methods employ constant step-sizes
and converge linearly to the {\em exact} solution under proper conditions.
However, their performance under stochastic and adaptive settings is less
explored. It is still unknown {\em whether}, {\em when} and {\em why} these
bias-correction methods can outperform their traditional counterparts (such as
consensus and diffusion) with noisy gradient and constant step-sizes.
This work studies the performance of exact diffusion under the stochastic and
adaptive setting, and provides conditions under which exact diffusion has
superior steady-state mean-square deviation (MSD) performance than traditional
algorithms without bias-correction. In particular, it is proven that this
superiority is more evident over sparsely-connected network topologies such as
lines, cycles, or grids. Conditions are also provided under which exact
diffusion method match or may even degrade the performance of traditional
methods. Simulations are provided to validate the theoretical findings.Comment: 17 pages, 9 figure, submitted for publicatio
Recommended from our members
On the Performance and Linear Convergence of Decentralized Primal-Dual Methods
This dissertation studies the performance and linear convergence properties of primal-dual methods for the solution of decentralized multi-agent optimization problems. Decentralized multi-agent optimization is a powerful paradigm that finds applications in diverse fields in learning and engineering design. In these setups, a network of agents is connected through some topology and agents are allowed to share information only locally. Their overall goal is to seek the minimizer of a global optimization problem through localized interactions. In decentralized consensus problems, the agents are coupled through a common consensus variable that they need to agree upon. While in decentralized resource allocation problems, the agents are coupled through global affine constraints. Various decentralized consensus optimization algorithms already exist in the literature. Some methods are derived from a primal-dual perspective, while other methods are derived as gradient tracking mechanisms meant to track the average of local gradients. Among the gradient tracking methods are the adapt-then-combine implementations motivated by diffusion strategies, which have been observed to perform better than other implementations. In this dissertation, we develop a novel adapt-then-combine primal-dual algorithmic framework that captures most state-of-the-art gradient based methods as special cases including all the variations of the gradient-tracking methods. We also develop a concise and novel analysis technique that establishes the linear convergence of this general framework under strongly-convex objectives. Due to our unified framework, the analysis reveals important characteristics for these methods such as their convergence rates and step-size stability ranges. Moreover, the analysis reveals how the augmented Lagrangian penalty term, which is utilized in most of these methods, affects the performance of decentralized algorithms. Another important question that we answer is whether decentralized proximal gradient methods can achieve global linear convergence for non-smooth composite optimization. For centralized algorithms, linear convergence has been established in the presence of a non-smooth composite term. In this dissertation, we close the gap between centralized and decentralized proximal gradient algorithms and show that decentralized proximal algorithms can also achieve linear convergence in the presence of a non-smooth term. Furthermore, we show that when each agent possesses a different local non-smooth term then global linear convergence cannot be established in the worst case. Most works that study decentralized optimization problems assume that all agents are involved in computing all variables. However, in many applications the coupling across agents is sparse in the sense that only a few agents are involved in computing certain variables. We show how to design decentralized algorithms in sparsely coupled consensus and resource allocation problems. More importantly, we establish analytically the importance of exploiting the sparsity structure in coupled large-scale networks