4,355 research outputs found

    POLO: a POLicy-based Optimization library

    Full text link
    We present POLO --- a C++ library for large-scale parallel optimization research that emphasizes ease-of-use, flexibility and efficiency in algorithm design. It uses multiple inheritance and template programming to decompose algorithms into essential policies and facilitate code reuse. With its clear separation between algorithm and execution policies, it provides researchers with a simple and powerful platform for prototyping ideas, evaluating them on different parallel computing architectures and hardware platforms, and generating compact and efficient production code. A C-API is included for customization and data loading in high-level languages. POLO enables users to move seamlessly from serial to multi-threaded shared-memory and multi-node distributed-memory executors. We demonstrate how POLO allows users to implement state-of-the-art asynchronous parallel optimization algorithms in just a few lines of code and report experiment results from shared and distributed-memory computing architectures. We provide both POLO and POLO.jl, a wrapper around POLO written in the Julia language, at https://github.com/pologrp under the permissive MIT license.Comment: 25 pages, 7 figure

    A2BCD: An Asynchronous Accelerated Block Coordinate Descent Algorithm With Optimal Complexity

    Full text link
    In this paper, we propose the Asynchronous Accelerated Nonuniform Randomized Block Coordinate Descent algorithm (A2BCD), the first asynchronous Nesterov-accelerated algorithm that achieves optimal complexity. This parallel algorithm solves the unconstrained convex minimization problem, using p computing nodes which compute updates to shared solution vectors, in an asynchronous fashion with no central coordination. Nodes in asynchronous algorithms do not wait for updates from other nodes before starting a new iteration, but simply compute updates using the most recent solution information available. This allows them to complete iterations much faster than traditional ones, especially at scale, by eliminating the costly synchronization penalty of traditional algorithms. We first prove that A2BCD converges linearly to a solution with a fast accelerated rate that matches the recently proposed NU_ACDM, so long as the maximum delay is not too large. Somewhat surprisingly, A2BCD pays no complexity penalty for using outdated information. We then prove lower complexity bounds for randomized coordinate descent methods, which show that A2BCD (and hence NU_ACDM) has optimal complexity to within a constant factor. We confirm with numerical experiments that A2BCD outperforms NU_ACDM, which is the current fastest coordinate descent algorithm, even at small scale. We also derive and analyze a second-order ordinary differential equation, which is the continuous-time limit of our algorithm, and prove it converges linearly to a solution with a similar accelerated rate.Comment: 33 pages, 6 figure

    Distributed Nesterov gradient methods over arbitrary graphs

    Full text link
    In this letter, we introduce a distributed Nesterov method, termed as ABN\mathcal{ABN}, that does not require doubly-stochastic weight matrices. Instead, the implementation is based on a simultaneous application of both row- and column-stochastic weights that makes this method applicable to arbitrary (strongly-connected) graphs. Since constructing column-stochastic weights needs additional information (the number of outgoing neighbors at each agent), not available in certain communication protocols, we derive a variation, termed as FROZEN, that only requires row-stochastic weights but at the expense of additional iterations for eigenvector learning. We numerically study these algorithms for various objective functions and network parameters and show that the proposed distributed Nesterov methods achieve acceleration compared to the current state-of-the-art methods for distributed optimization

    Accelerated Distributed Nesterov Gradient Descent

    Full text link
    This paper considers the distributed optimization problem over a network, where the objective is to optimize a global function formed by a sum of local functions, using only local computation and communication. We develop an Accelerated Distributed Nesterov Gradient Descent (Acc-DNGD) method. When the objective function is convex and LL-smooth, we show that it achieves a O(1t1.4ϵ)O(\frac{1}{t^{1.4-\epsilon}}) convergence rate for all ϵ(0,1.4)\epsilon\in(0,1.4). We also show the convergence rate can be improved to O(1t2)O(\frac{1}{t^2}) if the objective function is a composition of a linear map and a strongly-convex and smooth function. When the objective function is μ\mu-strongly convex and LL-smooth, we show that it achieves a linear convergence rate of O([1C(μL)5/7]t)O([ 1 - C (\frac{\mu}{L})^{5/7} ]^t), where Lμ\frac{L}{\mu} is the condition number of the objective, and C>0C>0 is some constant that does not depend on Lμ\frac{L}{\mu}.Comment: 55 pages, 8 figure

    Improving Fast Dual Ascent for MPC - Part I: The Distributed Case

    Full text link
    In dual decomposition, the dual to an optimization problem with a specific structure is solved in distributed fashion using (sub)gradient and recently also fast gradient methods. The traditional dual decomposition suffers from two main short-comings. The first is that the convergence is often slow, although fast gradient methods have significantly improved the situation. The second is that computation of the optimal step-size requires centralized computations, which hinders a fully distributed implementation of the algorithm. In this paper, the first issue is addressed by providing a tighter characterization of the dual function than what has previously been reported in the literature. Then a distributed and a parallel algorithm are presented in which the provided dual function approximation is minimized in each step. Since the approximation is more accurate than the approximation used in standard and fast dual decomposition, the convergence properties are improved. For the second issue, we extend a recent result to allow for a fully distributed parameter selection in the algorithm. Further, we show how to apply the proposed algorithms to optimization problems arising in distributed model predictive control (DMPC) and show that the proposed distributed algorithm enjoys distributed reconfiguration, i.e. plug-and-play, in the DMPC context

    Optimal Algorithms for Distributed Optimization

    Full text link
    In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb) is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers

    Hybrid Conditional Gradient - Smoothing Algorithms with Applications to Sparse and Low Rank Regularization

    Full text link
    We study a hybrid conditional gradient - smoothing algorithm (HCGS) for solving composite convex optimization problems which contain several terms over a bounded set. Examples of these include regularization problems with several norms as penalties and a norm constraint. HCGS extends conditional gradient methods to cases with multiple nonsmooth terms, in which standard conditional gradient methods may be difficult to apply. The HCGS algorithm borrows techniques from smoothing proximal methods and requires first-order computations (subgradients and proximity operations). Unlike proximal methods, HCGS benefits from the advantages of conditional gradient methods, which render it more efficient on certain large scale optimization problems. We demonstrate these advantages with simulations on two matrix optimization problems: regularization of matrices with combined 1\ell_1 and trace norm penalties; and a convex relaxation of sparse PCA

    Harnessing Smoothness to Accelerate Distributed Optimization

    Full text link
    There has been a growing effort in studying the distributed optimization problem over a network. The objective is to optimize a global function formed by a sum of local functions, using only local computation and communication. Literature has developed consensus-based distributed (sub)gradient descent (DGD) methods and has shown that they have the same convergence rate O(logtt)O(\frac{\log t}{\sqrt{t}}) as the centralized (sub)gradient methods (CGD) when the function is convex but possibly nonsmooth. However, when the function is convex and smooth, under the framework of DGD, it is unclear how to harness the smoothness to obtain a faster convergence rate comparable to CGD's convergence rate. In this paper, we propose a distributed algorithm that, despite using the same amount of communication per iteration as DGD, can effectively harnesses the function smoothness and converge to the optimum with a rate of O(1t)O(\frac{1}{t}). If the objective function is further strongly convex, our algorithm has a linear convergence rate. Both rates match the convergence rate of CGD. The key step in our algorithm is a novel gradient estimation scheme that uses history information to achieve fast and accurate estimation of the average gradient. To motivate the necessity of history information, we also show that it is impossible for a class of distributed algorithms like DGD to achieve a linear convergence rate without using history information even if the objective function is strongly convex and smooth.Comment: 30 pages, 4 figure

    A Unification and Generalization of Exact Distributed First Order Methods

    Full text link
    Recently, there has been significant progress in the development of distributed first order methods. (At least) two different types of methods, designed from very different perspectives, have been proposed that achieve both exact and linear convergence when a constant step size is used -- a favorable feature that was not achievable by most prior methods. In this paper, we unify, generalize, and improve convergence speed of these exact distributed first order methods. We first carry out a novel unifying analysis that sheds light on how the different existing methods compare. The analysis reveals that a major difference between the methods is on how a past dual gradient of an associated augmented Lagrangian dual function is weighted. We then capitalize on the insights from the analysis to derive a novel method -- with a tuned past gradient weighting -- that improves upon the existing methods. We establish for the proposed generalized method global R-linear convergence rate under strongly convex costs with Lipschitz continuous gradients.Comment: revised Dec 17, 201

    Learning Supervised PageRank with Gradient-Based and Gradient-Free Optimization Methods

    Full text link
    In this paper, we consider a non-convex loss-minimization problem of learning Supervised PageRank models, which can account for some properties not considered by classical approaches such as the classical PageRank model. We propose gradient-based and random gradient-free methods to solve this problem. Our algorithms are based on the concept of an inexact oracle and unlike the state state-of-the-art gradient-based method we manage to provide theoretically the convergence rate guarantees for both of them. In particular, under the assumption of local convexity of the loss function, our random gradient-free algorithm guarantees decrease of the loss function value expectation. At the same time, we theoretically justify that without convexity assumption for the loss function our gradient-based algorithm allows to find a point where the stationary condition is fulfilled with a given accuracy. For both proposed optimization algorithms, we find the settings of hyperparameters which give the lowest complexity (i.e., the number of arithmetic operations needed to achieve the given accuracy of the solution of the loss-minimization problem). The resulting estimates of the complexity are also provided. Finally, we apply proposed optimization algorithms to the web page ranking problem and compare proposed and state-of-the-art algorithms in terms of the considered loss function.Comment: 34 pages, 5 figure
    corecore