153 research outputs found

    An Asynchronous Distributed Proximal Gradient Method for Composite Convex Optimization

    Full text link
    We propose a distributed first-order augmented Lagrangian (DFAL) algorithm to minimize the sum of composite convex functions, where each term in the sum is a private cost function belonging to a node, and only nodes connected by an edge can directly communicate with each other. This optimization model abstracts a number of applications in distributed sensing and machine learning. We show that any limit point of DFAL iterates is optimal; and for any ϵ>0\epsilon>0, an ϵ\epsilon-optimal and ϵ\epsilon-feasible solution can be computed within O(log(ϵ1))\mathcal{O}(\log(\epsilon^{-1})) DFAL iterations, which require O(ψmax1.5dminϵ1)\mathcal{O}(\frac{\psi_{\max}^{1.5}}{d_{\min}} \epsilon^{-1}) proximal gradient computations and communications per node in total, where ψmax\psi_{\max} denotes the largest eigenvalue of the graph Laplacian, and dmind_{\min} is the minimum degree of the graph. We also propose an asynchronous version of DFAL by incorporating randomized block coordinate descent methods; and demonstrate the efficiency of DFAL on large scale sparse-group LASSO problems.Comment: The manuscript will appear in the Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 2015. JMLR: W&CP volume 37. Copyright 2015 by the author(s

    On the Q-linear convergence of Distributed Generalized ADMM under non-strongly convex function components

    Full text link
    Solving optimization problems in multi-agent networks where each agent only has partial knowledge of the problem has become an increasingly important problem. In this paper we consider the problem of minimizing the sum of nn convex functions. We assume that each function is only known by one agent. We show that Generalized Distributed ADMM converges Q-linearly to the solution of the mentioned optimization problem if the over all objective function is strongly convex but the functions known by each agent are allowed to be only convex. Establishing Q-linear convergence allows for tracking statements that can not be made if only R-linear convergence is guaranteed. Further, we establish the equivalence between Generalized Distributed ADMM and P-EXTRA for a sub-set of mixing matrices. This equivalence yields insights in the convergence of P-EXTRA when overshooting to accelerate convergence.Comment: Submitted to IEEE Transactions on Signal and Information Processing over Network

    Decentralized Accelerated Gradient Methods With Increasing Penalty Parameters

    Full text link
    In this paper, we study the communication and (sub)gradient computation costs in distributed optimization and give a sharp complexity analysis for the proposed distributed accelerated gradient methods. We present two algorithms based on the framework of the accelerated penalty method with increasing penalty parameters. Our first algorithm is for smooth distributed optimization and it obtains the near optimal O(Lϵ(1σ2(W))log1ϵ)O\left(\sqrt{\frac{L}{\epsilon(1-\sigma_2(W))}}\log\frac{1}{\epsilon}\right) communication complexity and the optimal O(Lϵ)O\left(\sqrt{\frac{L}{\epsilon}}\right) gradient computation complexity for LL-smooth convex problems, where σ2(W)\sigma_2(W) denotes the second largest singular value of the weight matrix WW associated to the network and ϵ\epsilon is the target accuracy. When the problem is μ\mu-strongly convex and LL-smooth, our algorithm has the near optimal O(Lμ(1σ2(W))log21ϵ)O\left(\sqrt{\frac{L}{\mu(1-\sigma_2(W))}}\log^2\frac{1}{\epsilon}\right) complexity for communications and the optimal O(Lμlog1ϵ)O\left(\sqrt{\frac{L}{\mu}}\log\frac{1}{\epsilon}\right) complexity for gradient computations. Our communication complexities are only worse by a factor of (log1ϵ)\left(\log\frac{1}{\epsilon}\right) than the lower bounds for the smooth distributed optimization. %As far as we know, our method is the first to achieve both communication and gradient computation lower bounds up to an extra logarithm factor for smooth distributed optimization. Our second algorithm is designed for non-smooth distributed optimization and it achieves both the optimal O(1ϵ1σ2(W))O\left(\frac{1}{\epsilon\sqrt{1-\sigma_2(W)}}\right) communication complexity and O(1ϵ2)O\left(\frac{1}{\epsilon^2}\right) subgradient computation complexity, which match the communication and subgradient computation complexity lower bounds for non-smooth distributed optimization.Comment: The previous name of this paper was "A Sharp Convergence Rate Analysis for Distributed Accelerated Gradient Methods". The contents are consisten

    Tight Linear Convergence Rate of ADMM for Decentralized Optimization

    Full text link
    The present paper considers leveraging network topology information to improve the convergence rate of ADMM for decentralized optimization, where networked nodes work collaboratively to minimize the objective. Such problems can be solved efficiently using ADMM via decomposing the objective into easier subproblems. Properly exploiting network topology can significantly improve the algorithm performance. Hybrid ADMM explores the direction of exploiting node information by taking into account node centrality but fails to utilize edge information. This paper fills the gap by incorporating both node and edge information and provides a novel convergence rate bound for decentralized ADMM that explicitly depends on network topology. Such a novel bound is attainable for certain class of problems, thus tight. The explicit dependence further suggests possible directions to optimal design of edge weights to achieve the best performance. Numerical experiments show that simple heuristic methods could achieve better performance, and also exhibits robustness to topology changes

    Multi-agent constrained optimization of a strongly convex function over time-varying directed networks

    Full text link
    We consider cooperative multi-agent consensus optimization problems over both static and time-varying communication networks, where only local communications are allowed. The objective is to minimize the sum of agent-specific possibly non-smooth composite convex functions over agent-specific private conic constraint sets; hence, the optimal consensus decision should lie in the intersection of these private sets. Assuming the sum function is strongly convex, we provide convergence rates in suboptimality, infeasibility and consensus violation; examine the effect of underlying network topology on the convergence rates of the proposed decentralized algorithms

    Distributed Stochastic Multi-Task Learning with Graph Regularization

    Full text link
    We propose methods for distributed graph-based multi-task learning that are based on weighted averaging of messages from other machines. Uniform averaging or diminishing stepsize in these methods would yield consensus (single task) learning. We show how simply skewing the averaging weights or controlling the stepsize allows learning different, but related, tasks on the different machines

    The primal-dual hybrid gradient method reduces to a primal method for linearly constrained optimization problems

    Full text link
    In this work, we show that for linearly constrained optimization problems the primal-dual hybrid gradient algorithm, analyzed by Chambolle and Pock [3], can be written as an entirely primal algorithm. This allows us to prove convergence of the iterates even in the degenerate cases when the linear system is inconsistent or when the strong duality does not hold. We also obtain new convergence rates which seem to improve existing ones in the literature. For a decentralized distributed optimization we show that the new scheme is much more efficient than the original one

    Towards More Efficient Stochastic Decentralized Learning: Faster Convergence and Sparse Communication

    Full text link
    Recently, the decentralized optimization problem is attracting growing attention. Most existing methods are deterministic with high per-iteration cost and have a convergence rate quadratically depending on the problem condition number. Besides, the dense communication is necessary to ensure the convergence even if the dataset is sparse. In this paper, we generalize the decentralized optimization problem to a monotone operator root finding problem, and propose a stochastic algorithm named DSBA that (i) converges geometrically with a rate linearly depending on the problem condition number, and (ii) can be implemented using sparse communication only. Additionally, DSBA handles learning problems like AUC-maximization which cannot be tackled efficiently in the decentralized setting. Experiments on convex minimization and AUC-maximization validate the efficiency of our method.Comment: Accepted to ICML 201

    ARock: an Algorithmic Framework for Asynchronous Parallel Coordinate Updates

    Full text link
    Finding a fixed point to a nonexpansive operator, i.e., x=Txx^*=Tx^*, abstracts many problems in numerical linear algebra, optimization, and other areas of scientific computing. To solve fixed-point problems, we propose ARock, an algorithmic framework in which multiple agents (machines, processors, or cores) update xx in an asynchronous parallel fashion. Asynchrony is crucial to parallel computing since it reduces synchronization wait, relaxes communication bottleneck, and thus speeds up computing significantly. At each step of ARock, an agent updates a randomly selected coordinate xix_i based on possibly out-of-date information on xx. The agents share xx through either global memory or communication. If writing xix_i is atomic, the agents can read and write xx without memory locks. Theoretically, we show that if the nonexpansive operator TT has a fixed point, then with probability one, ARock generates a sequence that converges to a fixed points of TT. Our conditions on TT and step sizes are weaker than comparable work. Linear convergence is also obtained. We propose special cases of ARock for linear systems, convex optimization, machine learning, as well as distributed and decentralized consensus problems. Numerical experiments of solving sparse logistic regression problems are presented.Comment: updated the linear convergence proof

    Robust and Scalable Power System State Estimation via Composite Optimization

    Full text link
    In today's cyber-enabled smart grids, high penetration of uncertain renewables, purposeful manipulation of meter readings, and the need for wide-area situational awareness, call for fast, accurate, and robust power system state estimation. The least-absolute-value (LAV) estimator is known for its robustness relative to the weighted least-squares (WLS) one. However, due to nonconvexity and nonsmoothness, existing LAV solvers based on linear programming are typically slow, hence inadequate for real-time system monitoring. This paper develops two novel algorithms for efficient LAV estimation, which draw from recent advances in composite optimization. The first is a deterministic linear proximal scheme that handles a sequence of convex quadratic problems, each efficiently solvable either via off-the-shelf algorithms or through the alternating direction method of multipliers. Leveraging the sparse connectivity inherent to power networks, the second scheme is stochastic, and updates only \emph{a few} entries of the complex voltage state vector per iteration. In particular, when voltage magnitude and (re)active power flow measurements are used only, this number reduces to one or two, \emph{regardless of} the number of buses in the network. This computational complexity evidently scales well to large-size power systems. Furthermore, by carefully \emph{mini-batching} the voltage and power flow measurements, accelerated implementation of the stochastic iterations becomes possible. The developed algorithms are numerically evaluated using a variety of benchmark power networks. Simulated tests corroborate that improved robustness can be attained at comparable or markedly reduced computation times for medium- or large-size networks relative to the "workhorse" WLS-based Gauss-Newton iterations.Comment: 10 pages, 3 figure
    corecore