Search CORE

153 research outputs found

An Asynchronous Distributed Proximal Gradient Method for Composite Convex Optimization

Author: Aybat Necdet Serhat
Iyengar Garud
Wang Zi
Publication venue
Publication date: 09/05/2015
Field of study

We propose a distributed first-order augmented Lagrangian (DFAL) algorithm to minimize the sum of composite convex functions, where each term in the sum is a private cost function belonging to a node, and only nodes connected by an edge can directly communicate with each other. This optimization model abstracts a number of applications in distributed sensing and machine learning. We show that any limit point of DFAL iterates is optimal; and for any

\epsilon>0

, an

\epsilon

-optimal and

\epsilon

-feasible solution can be computed within

\mathcal{O}(\log(\epsilon^{-1}))

DFAL iterations, which require

\mathcal{O}(\frac{\psi_{\max}^{1.5}}{d_{\min}} \epsilon^{-1})

proximal gradient computations and communications per node in total, where

\psi_{\max}

denotes the largest eigenvalue of the graph Laplacian, and

d_{\min}

is the minimum degree of the graph. We also propose an asynchronous version of DFAL by incorporating randomized block coordinate descent methods; and demonstrate the efficiency of DFAL on large scale sparse-group LASSO problems.Comment: The manuscript will appear in the Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 2015. JMLR: W&CP volume 37. Copyright 2015 by the author(s

arXiv.org e-Print Archive

On the Q-linear convergence of Distributed Generalized ADMM under non-strongly convex function components

Author: Jaldén Joakim
Maros Marie
Publication venue
Publication date: 04/05/2018
Field of study

Solving optimization problems in multi-agent networks where each agent only has partial knowledge of the problem has become an increasingly important problem. In this paper we consider the problem of minimizing the sum of

n

convex functions. We assume that each function is only known by one agent. We show that Generalized Distributed ADMM converges Q-linearly to the solution of the mentioned optimization problem if the over all objective function is strongly convex but the functions known by each agent are allowed to be only convex. Establishing Q-linear convergence allows for tracking statements that can not be made if only R-linear convergence is guaranteed. Further, we establish the equivalence between Generalized Distributed ADMM and P-EXTRA for a sub-set of mixing matrices. This equivalence yields insights in the convergence of P-EXTRA when overshooting to accelerate convergence.Comment: Submitted to IEEE Transactions on Signal and Information Processing over Network

arXiv.org e-Print Archive

Decentralized Accelerated Gradient Methods With Increasing Penalty Parameters

Author: Fang Cong
Li Huan
Lin Zhouchen
Yin Wotao
Publication venue
Publication date: 18/08/2020
Field of study

In this paper, we study the communication and (sub)gradient computation costs in distributed optimization and give a sharp complexity analysis for the proposed distributed accelerated gradient methods. We present two algorithms based on the framework of the accelerated penalty method with increasing penalty parameters. Our first algorithm is for smooth distributed optimization and it obtains the near optimal

O\left(\sqrt{\frac{L}{\epsilon(1-\sigma_2(W))}}\log\frac{1}{\epsilon}\right)

communication complexity and the optimal

O\left(\sqrt{\frac{L}{\epsilon}}\right)

gradient computation complexity for

L

-smooth convex problems, where

\sigma_2(W)

denotes the second largest singular value of the weight matrix

W

associated to the network and

\epsilon

is the target accuracy. When the problem is

\mu

-strongly convex and

L

-smooth, our algorithm has the near optimal

O\left(\sqrt{\frac{L}{\mu(1-\sigma_2(W))}}\log^2\frac{1}{\epsilon}\right)

complexity for communications and the optimal

O\left(\sqrt{\frac{L}{\mu}}\log\frac{1}{\epsilon}\right)

complexity for gradient computations. Our communication complexities are only worse by a factor of

\left(\log\frac{1}{\epsilon}\right)

than the lower bounds for the smooth distributed optimization. %As far as we know, our method is the first to achieve both communication and gradient computation lower bounds up to an extra logarithm factor for smooth distributed optimization. Our second algorithm is designed for non-smooth distributed optimization and it achieves both the optimal

O\left(\frac{1}{\epsilon\sqrt{1-\sigma_2(W)}}\right)

communication complexity and

O\left(\frac{1}{\epsilon^2}\right)

subgradient computation complexity, which match the communication and subgradient computation complexity lower bounds for non-smooth distributed optimization.Comment: The previous name of this paper was "A Sharp Convergence Rate Analysis for Distributed Accelerated Gradient Methods". The contents are consisten

arXiv.org e-Print Archive

Tight Linear Convergence Rate of ADMM for Decentralized Optimization

Author: Giannakis Georgios B.
Li Bingcong
Ma Meng
Publication venue
Publication date: 24/05/2019
Field of study

The present paper considers leveraging network topology information to improve the convergence rate of ADMM for decentralized optimization, where networked nodes work collaboratively to minimize the objective. Such problems can be solved efficiently using ADMM via decomposing the objective into easier subproblems. Properly exploiting network topology can significantly improve the algorithm performance. Hybrid ADMM explores the direction of exploiting node information by taking into account node centrality but fails to utilize edge information. This paper fills the gap by incorporating both node and edge information and provides a novel convergence rate bound for decentralized ADMM that explicitly depends on network topology. Such a novel bound is attainable for certain class of problems, thus tight. The explicit dependence further suggests possible directions to optimal design of edge weights to achieve the best performance. Numerical experiments show that simple heuristic methods could achieve better performance, and also exhibits robustness to topology changes

arXiv.org e-Print Archive

Multi-agent constrained optimization of a strongly convex function over time-varying directed networks

Author: Aybat Necdet Serhat
Hamedani Erfan Yazdandoost
Publication venue
Publication date: 24/06/2017
Field of study

We consider cooperative multi-agent consensus optimization problems over both static and time-varying communication networks, where only local communications are allowed. The objective is to minimize the sum of agent-specific possibly non-smooth composite convex functions over agent-specific private conic constraint sets; hence, the optimal consensus decision should lie in the intersection of these private sets. Assuming the sum function is strongly convex, we provide convergence rates in suboptimality, infeasibility and consensus violation; examine the effect of underlying network topology on the convergence rates of the proposed decentralized algorithms

arXiv.org e-Print Archive

Distributed Stochastic Multi-Task Learning with Graph Regularization

Author: Kolar Mladen
Srebro Nathan
Wang Jialei
Wang Weiran
Publication venue
Publication date: 11/02/2018
Field of study

We propose methods for distributed graph-based multi-task learning that are based on weighted averaging of messages from other machines. Uniform averaging or diminishing stepsize in these methods would yield consensus (single task) learning. We show how simply skewing the averaging weights or controlling the stepsize allows learning different, but related, tasks on the different machines

arXiv.org e-Print Archive

The primal-dual hybrid gradient method reduces to a primal method for linearly constrained optimization problems

Author: Malitsky Yura
Publication venue
Publication date: 24/05/2019
Field of study

In this work, we show that for linearly constrained optimization problems the primal-dual hybrid gradient algorithm, analyzed by Chambolle and Pock [3], can be written as an entirely primal algorithm. This allows us to prove convergence of the iterates even in the degenerate cases when the linear system is inconsistent or when the strong duality does not hold. We also obtain new convergence rates which seem to improve existing ones in the literature. For a decentralized distributed optimization we show that the new scheme is much more efficient than the original one

arXiv.org e-Print Archive

Towards More Efficient Stochastic Decentralized Learning: Faster Convergence and Sparse Communication

Author: Mokhtari Aryan
Qian Hui
Shen Zebang
Zhao Peilin
Zhou Tengfei
Publication venue
Publication date: 24/05/2018
Field of study

Recently, the decentralized optimization problem is attracting growing attention. Most existing methods are deterministic with high per-iteration cost and have a convergence rate quadratically depending on the problem condition number. Besides, the dense communication is necessary to ensure the convergence even if the dataset is sparse. In this paper, we generalize the decentralized optimization problem to a monotone operator root finding problem, and propose a stochastic algorithm named DSBA that (i) converges geometrically with a rate linearly depending on the problem condition number, and (ii) can be implemented using sparse communication only. Additionally, DSBA handles learning problems like AUC-maximization which cannot be tackled efficiently in the decentralized setting. Experiments on convex minimization and AUC-maximization validate the efficiency of our method.Comment: Accepted to ICML 201

arXiv.org e-Print Archive

ARock: an Algorithmic Framework for Asynchronous Parallel Coordinate Updates

Author: Peng Zhimin
Xu Yangyang
Yan Ming
Yin Wotao
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 26/05/2016
Field of study

Finding a fixed point to a nonexpansive operator, i.e.,

x^*=Tx^*

, abstracts many problems in numerical linear algebra, optimization, and other areas of scientific computing. To solve fixed-point problems, we propose ARock, an algorithmic framework in which multiple agents (machines, processors, or cores) update

x

in an asynchronous parallel fashion. Asynchrony is crucial to parallel computing since it reduces synchronization wait, relaxes communication bottleneck, and thus speeds up computing significantly. At each step of ARock, an agent updates a randomly selected coordinate

x_i

based on possibly out-of-date information on

x

. The agents share

x

through either global memory or communication. If writing

x_i

is atomic, the agents can read and write

x

without memory locks. Theoretically, we show that if the nonexpansive operator

T

has a fixed point, then with probability one, ARock generates a sequence that converges to a fixed points of

T

. Our conditions on

T

and step sizes are weaker than comparable work. Linear convergence is also obtained. We propose special cases of ARock for linear systems, convex optimization, machine learning, as well as distributed and decentralized consensus problems. Numerical experiments of solving sparse logistic regression problems are presented.Comment: updated the linear convergence proof

arXiv.org e-Print Archive

Robust and Scalable Power System State Estimation via Composite Optimization

Author: Chen Jie
Giannakis Georgios B.
Wang Gang
Publication venue
Publication date: 06/02/2019
Field of study

In today's cyber-enabled smart grids, high penetration of uncertain renewables, purposeful manipulation of meter readings, and the need for wide-area situational awareness, call for fast, accurate, and robust power system state estimation. The least-absolute-value (LAV) estimator is known for its robustness relative to the weighted least-squares (WLS) one. However, due to nonconvexity and nonsmoothness, existing LAV solvers based on linear programming are typically slow, hence inadequate for real-time system monitoring. This paper develops two novel algorithms for efficient LAV estimation, which draw from recent advances in composite optimization. The first is a deterministic linear proximal scheme that handles a sequence of convex quadratic problems, each efficiently solvable either via off-the-shelf algorithms or through the alternating direction method of multipliers. Leveraging the sparse connectivity inherent to power networks, the second scheme is stochastic, and updates only \emph{a few} entries of the complex voltage state vector per iteration. In particular, when voltage magnitude and (re)active power flow measurements are used only, this number reduces to one or two, \emph{regardless of} the number of buses in the network. This computational complexity evidently scales well to large-size power systems. Furthermore, by carefully \emph{mini-batching} the voltage and power flow measurements, accelerated implementation of the stochastic iterations becomes possible. The developed algorithms are numerically evaluated using a variety of benchmark power networks. Simulated tests corroborate that improved robustness can be attained at comparable or markedly reduced computation times for medium- or large-size networks relative to the "workhorse" WLS-based Gauss-Newton iterations.Comment: 10 pages, 3 figure

arXiv.org e-Print Archive