153 research outputs found
An Asynchronous Distributed Proximal Gradient Method for Composite Convex Optimization
We propose a distributed first-order augmented Lagrangian (DFAL) algorithm to
minimize the sum of composite convex functions, where each term in the sum is a
private cost function belonging to a node, and only nodes connected by an edge
can directly communicate with each other. This optimization model abstracts a
number of applications in distributed sensing and machine learning. We show
that any limit point of DFAL iterates is optimal; and for any , an
-optimal and -feasible solution can be computed within
DFAL iterations, which require
proximal
gradient computations and communications per node in total, where
denotes the largest eigenvalue of the graph Laplacian, and is the
minimum degree of the graph. We also propose an asynchronous version of DFAL by
incorporating randomized block coordinate descent methods; and demonstrate the
efficiency of DFAL on large scale sparse-group LASSO problems.Comment: The manuscript will appear in the Proceedings of the 32nd
International Conference on Machine Learning, Lille, France, 2015. JMLR: W&CP
volume 37. Copyright 2015 by the author(s
On the Q-linear convergence of Distributed Generalized ADMM under non-strongly convex function components
Solving optimization problems in multi-agent networks where each agent only
has partial knowledge of the problem has become an increasingly important
problem. In this paper we consider the problem of minimizing the sum of
convex functions. We assume that each function is only known by one agent. We
show that Generalized Distributed ADMM converges Q-linearly to the solution of
the mentioned optimization problem if the over all objective function is
strongly convex but the functions known by each agent are allowed to be only
convex. Establishing Q-linear convergence allows for tracking statements that
can not be made if only R-linear convergence is guaranteed. Further, we
establish the equivalence between Generalized Distributed ADMM and P-EXTRA for
a sub-set of mixing matrices. This equivalence yields insights in the
convergence of P-EXTRA when overshooting to accelerate convergence.Comment: Submitted to IEEE Transactions on Signal and Information Processing
over Network
Decentralized Accelerated Gradient Methods With Increasing Penalty Parameters
In this paper, we study the communication and (sub)gradient computation costs
in distributed optimization and give a sharp complexity analysis for the
proposed distributed accelerated gradient methods. We present two algorithms
based on the framework of the accelerated penalty method with increasing
penalty parameters. Our first algorithm is for smooth distributed optimization
and it obtains the near optimal
communication complexity and the optimal
gradient computation complexity for
-smooth convex problems, where denotes the second largest
singular value of the weight matrix associated to the network and
is the target accuracy. When the problem is -strongly convex
and -smooth, our algorithm has the near optimal
complexity for communications and the optimal
complexity for
gradient computations. Our communication complexities are only worse by a
factor of than the lower bounds for the
smooth distributed optimization. %As far as we know, our method is the first to
achieve both communication and gradient computation lower bounds up to an extra
logarithm factor for smooth distributed optimization. Our second algorithm is
designed for non-smooth distributed optimization and it achieves both the
optimal communication
complexity and subgradient computation
complexity, which match the communication and subgradient computation
complexity lower bounds for non-smooth distributed optimization.Comment: The previous name of this paper was "A Sharp Convergence Rate
Analysis for Distributed Accelerated Gradient Methods". The contents are
consisten
Tight Linear Convergence Rate of ADMM for Decentralized Optimization
The present paper considers leveraging network topology information to
improve the convergence rate of ADMM for decentralized optimization, where
networked nodes work collaboratively to minimize the objective. Such problems
can be solved efficiently using ADMM via decomposing the objective into easier
subproblems. Properly exploiting network topology can significantly improve the
algorithm performance. Hybrid ADMM explores the direction of exploiting node
information by taking into account node centrality but fails to utilize edge
information. This paper fills the gap by incorporating both node and edge
information and provides a novel convergence rate bound for decentralized ADMM
that explicitly depends on network topology. Such a novel bound is attainable
for certain class of problems, thus tight. The explicit dependence further
suggests possible directions to optimal design of edge weights to achieve the
best performance. Numerical experiments show that simple heuristic methods
could achieve better performance, and also exhibits robustness to topology
changes
Multi-agent constrained optimization of a strongly convex function over time-varying directed networks
We consider cooperative multi-agent consensus optimization problems over both
static and time-varying communication networks, where only local communications
are allowed. The objective is to minimize the sum of agent-specific possibly
non-smooth composite convex functions over agent-specific private conic
constraint sets; hence, the optimal consensus decision should lie in the
intersection of these private sets. Assuming the sum function is strongly
convex, we provide convergence rates in suboptimality, infeasibility and
consensus violation; examine the effect of underlying network topology on the
convergence rates of the proposed decentralized algorithms
Distributed Stochastic Multi-Task Learning with Graph Regularization
We propose methods for distributed graph-based multi-task learning that are
based on weighted averaging of messages from other machines. Uniform averaging
or diminishing stepsize in these methods would yield consensus (single task)
learning. We show how simply skewing the averaging weights or controlling the
stepsize allows learning different, but related, tasks on the different
machines
The primal-dual hybrid gradient method reduces to a primal method for linearly constrained optimization problems
In this work, we show that for linearly constrained optimization problems the
primal-dual hybrid gradient algorithm, analyzed by Chambolle and Pock [3], can
be written as an entirely primal algorithm. This allows us to prove convergence
of the iterates even in the degenerate cases when the linear system is
inconsistent or when the strong duality does not hold. We also obtain new
convergence rates which seem to improve existing ones in the literature. For a
decentralized distributed optimization we show that the new scheme is much more
efficient than the original one
Towards More Efficient Stochastic Decentralized Learning: Faster Convergence and Sparse Communication
Recently, the decentralized optimization problem is attracting growing
attention. Most existing methods are deterministic with high per-iteration cost
and have a convergence rate quadratically depending on the problem condition
number. Besides, the dense communication is necessary to ensure the convergence
even if the dataset is sparse. In this paper, we generalize the decentralized
optimization problem to a monotone operator root finding problem, and propose a
stochastic algorithm named DSBA that (i) converges geometrically with a rate
linearly depending on the problem condition number, and (ii) can be implemented
using sparse communication only. Additionally, DSBA handles learning problems
like AUC-maximization which cannot be tackled efficiently in the decentralized
setting. Experiments on convex minimization and AUC-maximization validate the
efficiency of our method.Comment: Accepted to ICML 201
ARock: an Algorithmic Framework for Asynchronous Parallel Coordinate Updates
Finding a fixed point to a nonexpansive operator, i.e., , abstracts
many problems in numerical linear algebra, optimization, and other areas of
scientific computing. To solve fixed-point problems, we propose ARock, an
algorithmic framework in which multiple agents (machines, processors, or cores)
update in an asynchronous parallel fashion. Asynchrony is crucial to
parallel computing since it reduces synchronization wait, relaxes communication
bottleneck, and thus speeds up computing significantly. At each step of ARock,
an agent updates a randomly selected coordinate based on possibly
out-of-date information on . The agents share through either global
memory or communication. If writing is atomic, the agents can read and
write without memory locks.
Theoretically, we show that if the nonexpansive operator has a fixed
point, then with probability one, ARock generates a sequence that converges to
a fixed points of . Our conditions on and step sizes are weaker than
comparable work. Linear convergence is also obtained.
We propose special cases of ARock for linear systems, convex optimization,
machine learning, as well as distributed and decentralized consensus problems.
Numerical experiments of solving sparse logistic regression problems are
presented.Comment: updated the linear convergence proof
Robust and Scalable Power System State Estimation via Composite Optimization
In today's cyber-enabled smart grids, high penetration of uncertain
renewables, purposeful manipulation of meter readings, and the need for
wide-area situational awareness, call for fast, accurate, and robust power
system state estimation. The least-absolute-value (LAV) estimator is known for
its robustness relative to the weighted least-squares (WLS) one. However, due
to nonconvexity and nonsmoothness, existing LAV solvers based on linear
programming are typically slow, hence inadequate for real-time system
monitoring. This paper develops two novel algorithms for efficient LAV
estimation, which draw from recent advances in composite optimization. The
first is a deterministic linear proximal scheme that handles a sequence of
convex quadratic problems, each efficiently solvable either via off-the-shelf
algorithms or through the alternating direction method of multipliers.
Leveraging the sparse connectivity inherent to power networks, the second
scheme is stochastic, and updates only \emph{a few} entries of the complex
voltage state vector per iteration. In particular, when voltage magnitude and
(re)active power flow measurements are used only, this number reduces to one or
two, \emph{regardless of} the number of buses in the network. This
computational complexity evidently scales well to large-size power systems.
Furthermore, by carefully \emph{mini-batching} the voltage and power flow
measurements, accelerated implementation of the stochastic iterations becomes
possible. The developed algorithms are numerically evaluated using a variety of
benchmark power networks. Simulated tests corroborate that improved robustness
can be attained at comparable or markedly reduced computation times for medium-
or large-size networks relative to the "workhorse" WLS-based Gauss-Newton
iterations.Comment: 10 pages, 3 figure
- …