22 research outputs found
ExtraPush for convex smooth decentralized optimization over directed networks
In this note, we extend the algorithms Extra and subgradient-push to a new
algorithm ExtraPush for consensus optimization with convex differentiable
objective functions over a directed network. When the stationary distribution
of the network can be computed in advance}, we propose a simplified algorithm
called Normalized ExtraPush. Just like Extra, both ExtraPush and Normalized
ExtraPush can iterate with a fixed step size. But unlike Extra, they can take a
column-stochastic mixing matrix, which is not necessarily doubly stochastic.
Therefore, they remove the undirected-network restriction of Extra.
Subgradient-push, while also works for directed networks, is slower on the same
type of problem because it must use a sequence of diminishing step sizes.
We present preliminary analysis for ExtraPush under a bounded sequence
assumption. For Normalized ExtraPush, we show that it naturally produces a
bounded, linearly convergent sequence provided that the objective function is
strongly convex.
In our numerical experiments, ExtraPush and Normalized ExtraPush performed
similarly well. They are significantly faster than subgradient-push, even when
we hand-optimize the step sizes for the latter.Comment: 16 pages, 3 figure
Linear Convergence of First- and Zeroth-Order Primal-Dual Algorithms for Distributed Nonconvex Optimization
This paper considers the distributed nonconvex optimization problem of
minimizing a global cost function formed by a sum of local cost functions by
using local information exchange. We first propose a distributed first-order
primal-dual algorithm. We show that it converges sublinearly to the stationary
point if each local cost function is smooth and linearly to the global optimum
under an additional condition that the global cost function satisfies the
Polyak-{\L}ojasiewicz condition. This condition is weaker than strong
convexity, which is a standard condition for proving the linear convergence of
distributed optimization algorithms, and the global minimizer is not
necessarily unique or finite. Motivated by the situations where the gradients
are unavailable, we then propose a distributed zeroth-order algorithm, derived
from the proposed distributed first-order algorithm by using a deterministic
gradient estimator, and show that it has the same convergence properties as the
proposed first-order algorithm under the same conditions. The theoretical
results are illustrated by numerical simulations
A Robust Gradient Tracking Method for Distributed Optimization over Directed Networks
In this paper, we consider the problem of distributed consensus optimization
over multi-agent networks with directed network topology. Assuming each agent
has a local cost function that is smooth and strongly convex, the global
objective is to minimize the average of all the local cost functions. To solve
the problem, we introduce a robust gradient tracking method (R-Push-Pull)
adapted from the recently proposed Push-Pull/AB algorithm. R-Push-Pull inherits
the advantages of Push-Pull and enjoys linear convergence to the optimal
solution with exact communication. Under noisy information exchange,
R-Push-Pull is more robust than the existing gradient tracking based
algorithms; the solutions obtained by each agent reach a neighborhood of the
optimum in expectation exponentially fast under a constant stepsize policy. We
provide a numerical example that demonstrate the effectiveness of R-Push-Pull
A decentralized proximal-gradient method with network independent step-sizes and separated convergence rates
This paper proposes a novel proximal-gradient algorithm for a decentralized
optimization problem with a composite objective containing smooth and
non-smooth terms. Specifically, the smooth and nonsmooth terms are dealt with
by gradient and proximal updates, respectively. The proposed algorithm is
closely related to a previous algorithm, PG-EXTRA \cite{shi2015proximal}, but
has a few advantages. First of all, agents use uncoordinated step-sizes, and
the stable upper bounds on step-sizes are independent of network topologies.
The step-sizes depend on local objective functions, and they can be as large as
those of the gradient descent. Secondly, for the special case without
non-smooth terms, linear convergence can be achieved under the strong convexity
assumption. The dependence of the convergence rate on the objective functions
and the network are separated, and the convergence rate of the new algorithm is
as good as one of the two convergence rates that match the typical rates for
the general gradient descent and the consensus averaging. We provide numerical
experiments to demonstrate the efficacy of the introduced algorithm and
validate our theoretical discoveries
Exponential Convergence for Distributed Smooth Optimization Under the Restricted Secant Inequality Condition
This paper considers the distributed smooth optimization problem in which the
objective is to minimize a global cost function formed by a sum of local smooth
cost functions, by using local information exchange. The standard assumption
for proving exponential/linear convergence of first-order methods is the strong
convexity of the cost functions, which does not hold for many practical
applications. In this paper, we first show that the continuous-time distributed
primal-dual gradient algorithm converges to one global minimizer exponentially
under the assumption that the global cost function satisfies the restricted
secant inequality condition. This condition is weaker than the strong convexity
condition since it does not require convexity and the global minimizers are not
necessary to be unique. We then show that the discrete-time distributed
primal-dual algorithm constructed by using the Euler's approximation method
converges to one global minimizer linearly under the same condition. The
theoretical results are illustrated by numerical simulations
On the Linear Convergence of Distributed Optimization over Directed Graphs
This paper develops a fast distributed algorithm, termed \emph{DEXTRA}, to
solve the optimization problem when~ agents reach agreement and
collaboratively minimize the sum of their local objective functions over the
network, where the communication between the agents is described by
a~\emph{directed} graph. Existing algorithms solve the problem restricted to
directed graphs with convergence rates of for general
convex objective functions and when the objective functions are
strongly-convex, where~ is the number of iterations. We show that, with the
appropriate step-size, DEXTRA converges at a linear rate for
, given that the objective functions are restricted strongly-convex.
The implementation of DEXTRA requires each agent to know its local out-degree.
Simulation examples further illustrate our findings
Distributed Optimization over Directed Graphs with Row Stochasticity and Constraint Regularity
This paper deals with an optimization problem over a network of agents, where
the cost function is the sum of the individual objectives of the agents and the
constraint set is the intersection of local constraints. Most existing methods
employing subgradient and consensus steps for solving this problem require the
weight matrix associated with the network to be column stochastic or even
doubly stochastic, conditions that can be hard to arrange in directed networks.
Moreover, known convergence analyses for distributed subgradient methods vary
depending on whether the problem is unconstrained or constrained, and whether
the local constraint sets are identical or nonidentical and compact. The main
goals of this paper are: (i) removing the common column stochasticity
requirement; (ii) relaxing the compactness assumption, and (iii) providing a
unified convergence analysis. Specifically, assuming the communication graph to
be fixed and strongly connected and the weight matrix to (only) be row
stochastic, a distributed projected subgradient algorithm and its variation are
presented to solve the problem for cost functions that are convex and Lipschitz
continuous. Based on a regularity assumption on the local constraint sets, a
unified convergence analysis is given that can be applied to both unconstrained
and constrained problems and without assuming compactness of the constraint
sets or an interior point in their intersection. Further, we also establish an
upper bound on the absolute objective error evaluated at each agent's available
local estimate under a nonincreasing step size sequence. This bound allows us
to analyze the convergence rate of both algorithms.Comment: 14 pages, 3 figure
A Unification and Generalization of Exact Distributed First Order Methods
Recently, there has been significant progress in the development of
distributed first order methods. (At least) two different types of methods,
designed from very different perspectives, have been proposed that achieve both
exact and linear convergence when a constant step size is used -- a favorable
feature that was not achievable by most prior methods. In this paper, we unify,
generalize, and improve convergence speed of these exact distributed first
order methods. We first carry out a novel unifying analysis that sheds light on
how the different existing methods compare. The analysis reveals that a major
difference between the methods is on how a past dual gradient of an associated
augmented Lagrangian dual function is weighted. We then capitalize on the
insights from the analysis to derive a novel method -- with a tuned past
gradient weighting -- that improves upon the existing methods. We establish for
the proposed generalized method global R-linear convergence rate under strongly
convex costs with Lipschitz continuous gradients.Comment: revised Dec 17, 201
Push-Pull Gradient Methods for Distributed Optimization in Networks
In this paper, we focus on solving a distributed convex optimization problem
in a network, where each agent has its own convex cost function and the goal is
to minimize the sum of the agents' cost functions while obeying the network
connectivity structure. In order to minimize the sum of the cost functions, we
consider new distributed gradient-based methods where each node maintains two
estimates, namely, an estimate of the optimal decision variable and an estimate
of the gradient for the average of the agents' objective functions. From the
viewpoint of an agent, the information about the gradients is pushed to the
neighbors, while the information about the decision variable is pulled from the
neighbors hence giving the name "push-pull gradient methods". The methods
utilize two different graphs for the information exchange among agents, and as
such, unify the algorithms with different types of distributed architecture,
including decentralized (peer-to-peer), centralized (master-slave), and
semi-centralized (leader-follower) architecture. We show that the proposed
algorithms and their many variants converge linearly for strongly convex and
smooth objective functions over a network (possibly with unidirectional data
links) in both synchronous and asynchronous random-gossip settings. In
particular, under the random-gossip setting, "push-pull" is the first class of
algorithms for distributed optimization over directed graphs. Moreover, we
numerically evaluate our proposed algorithms in both scenarios, and show that
they outperform other existing linearly convergent schemes, especially for
ill-conditioned problems and networks that are not well balanced.Comment: Parts of the results appear in Proceedings of the 57th IEEE
Conference on Decision and Control (see arXiv:1803.07588
Distributed Dual Gradient Tracking for Resource Allocation in Unbalanced Networks
This paper proposes a distributed dual gradient tracking algorithm (DDGT) to
solve resource allocation problems over an unbalanced network, where each node
in the network holds a private cost function and computes the optimal resource
by interacting only with its neighboring nodes. Our key idea is the novel use
of the distributed push-pull gradient algorithm (PPG) to solve the dual problem
of the resource allocation problem. To study the convergence of the DDGT, we
first establish the sublinear convergence rate of PPG for non-convex objective
functions, which advances the existing results on PPG as they require the
strong-convexity of objective functions. Then we show that the DDGT converges
linearly for strongly convex and Lipschitz smooth cost functions, and
sublinearly without the Lipschitz smoothness. Finally, experimental results
suggest that DDGT outperforms existing algorithms.Comment: Accepted by IEEE Transactions on Signal Processing. This version
fixed some typos in the accepted versio