1,955 research outputs found

    Inertial Proximal Incremental Aggregated Gradient Method

    Full text link
    In this paper, we introduce an inertial version of the Proximal Incremental Aggregated Gradient method (PIAG) for minimizing the sum of smooth convex component functions and a possibly nonsmooth convex regularization function. Theoretically, we show that the inertial Proximal Incremental Aggregated Gradiend (iPIAG) method enjoys a global linear convergence under a quadratic growth condition, which is strictly weaker than strong convexity, provided that the stepsize is not larger than a constant. Moreover, we present two numerical expreiments which demonstrate that iPIAG outperforms the original PIAG.Comment: 17 pages, 3 figure

    Proximal-Like Incremental Aggregated Gradient Method with Linear Convergence under Bregman Distance Growth Conditions

    Full text link
    We introduce a unified algorithmic framework, called proximal-like incremental aggregated gradient (PLIAG) method, for minimizing the sum of a convex function that consists of additive relatively smooth convex components and a proper lower semi-continuous convex regularization function, over an abstract feasible set whose geometry can be captured by using the domain of a Legendre function. The PLIAG method includes many existing algorithms in the literature as special cases such as the proximal gradient method, the Bregman proximal gradient method (also called NoLips algorithm), the incremental aggregated gradient method, the incremental aggregated proximal method, and the proximal incremental aggregated gradient method. It also includes some novel interesting iteration schemes. First we show the PLIAG method is globally sublinearly convergent without requiring a growth condition, which extends the sublinear convergence result for the proximal gradient algorithm to incremental aggregated type first order methods. Then by embedding a so-called Bregman distance growth condition into a descent-type lemma to construct a special Lyapunov function, we show that the PLIAG method is globally linearly convergent in terms of both function values and Bregman distances to the optimal solution set, provided that the step size is not greater than some positive constant. These convergence results derived in this paper are all established beyond the standard assumptions in the literature (i.e., without requiring the strong convexity and the Lipschitz gradient continuity of the smooth part of the objective). When specialized to many existing algorithms, our results recover or supplement their convergence results under strictly weaker conditions.Comment: 28 page

    Linear Convergence of Cyclic SAGA

    Full text link
    In this work, we present and analyze C-SAGA, a (deterministic) cyclic variant of SAGA. C-SAGA is an incremental gradient method that minimizes a sum of differentiable convex functions by cyclically accessing their gradients. Even though the theory of stochastic algorithms is more mature than that of cyclic counterparts in general, practitioners often prefer cyclic algorithms. We prove C-SAGA converges linearly under the standard assumptions. Then, we compare the rate of convergence with the full gradient method, (stochastic) SAGA, and incremental aggregated gradient (IAG), theoretically and experimentally.Comment: Published in Optimization Letter

    Incremental Aggregated Proximal and Augmented Lagrangian Algorithms

    Full text link
    We consider minimization of the sum of a large number of convex functions, and we propose an incremental aggregated version of the proximal algorithm, which bears similarity to the incremental aggregated gradient and subgradient methods that have received a lot of recent attention. Under cost function differentiability and strong convexity assumptions, we show linear convergence for a sufficiently small constant stepsize. This result also applies to distributed asynchronous variants of the method, involving bounded interprocessor communication delays. We then consider dual versions of incremental proximal algorithms, which are incremental augmented Lagrangian methods for separable equality-constrained optimization problems. Contrary to the standard augmented Lagrangian method, these methods admit decomposition in the minimization of the augmented Lagrangian, and update the multipliers far more frequently. Our incremental aggregated augmented Lagrangian methods bear similarity to several known decomposition algorithms, including the alternating direction method of multipliers (ADMM) and more recent variations. We compare these methods in terms of their properties, and highlight their potential advantages and limitations. We also address the solution of separable inequality-constrained optimization problems through the use of nonquadratic augmented Lagrangiias such as the exponential, and we dually consider a corresponding incremental aggregated version of the proximal algorithm that uses nonquadratic regularization, such as an entropy function. We finally propose a closely related linearly convergent method for minimization of large differentiable sums subject to an orthant constraint, which may be viewed as an incremental aggregated version of the mirror descent method

    Curvature-aided Incremental Aggregated Gradient Method

    Full text link
    We propose a new algorithm for finite sum optimization which we call the curvature-aided incremental aggregated gradient (CIAG) method. Motivated by the problem of training a classifier for a d-dimensional problem, where the number of training data is mm and m≫d≫1m \gg d \gg 1, the CIAG method seeks to accelerate incremental aggregated gradient (IAG) methods using aids from the curvature (or Hessian) information, while avoiding the evaluation of matrix inverses required by the incremental Newton (IN) method. Specifically, our idea is to exploit the incrementally aggregated Hessian matrix to trace the full gradient vector at every incremental step, therefore achieving an improved linear convergence rate over the state-of-the-art IAG methods. For strongly convex problems, the fast linear convergence rate requires the objective function to be close to quadratic, or the initial point to be close to optimal solution. Importantly, we show that running one iteration of the CIAG method yields the same improvement to the optimality gap as running one iteration of the full gradient method, while the complexity is O(d2)O(d^2) for CIAG and O(md)O(md) for the full gradient. Overall, the CIAG method strikes a balance between the high computation complexity incremental Newton-type methods and the slow IAG method. Our numerical results support the theoretical findings and show that the CIAG method often converges with much fewer iterations than IAG, and requires much shorter running time than IN when the problem dimension is high.Comment: Final version submitted to Allerton Conference 2017 on Oct 8, 201

    An Inertial Parallel and Asynchronous Fixed-Point Iteration for Convex Optimization

    Full text link
    Two characteristics that make convex decomposition algorithms attractive are simplicity of operations and generation of parallelizable structures. In principle, these schemes require that all coordinates update at the same time, i.e., they are synchronous by construction. Introducing asynchronicity in the updates can resolve several issues that appear in the synchronous case, like load imbalances in the computations or failing communication links. However, and to the best of our knowledge, there are no instances of asynchronous versions of commonly-known algorithms combined with inertial acceleration techniques. In this work we propose an inertial asynchronous and parallel fixed-point iteration from which several new versions of existing convex optimization algorithms emanate. Departing from the norm that the frequency of the coordinates' updates should comply to some prior distribution, we propose a scheme where the only requirement is that the coordinates update within a bounded interval. We prove convergence of the sequence of iterates generated by the scheme at a linear rate. One instance of the proposed scheme is implemented to solve a distributed optimization load sharing problem in a smart grid setting and its superiority with respect to the non-accelerated version is illustrated

    POLO: a POLicy-based Optimization library

    Full text link
    We present POLO --- a C++ library for large-scale parallel optimization research that emphasizes ease-of-use, flexibility and efficiency in algorithm design. It uses multiple inheritance and template programming to decompose algorithms into essential policies and facilitate code reuse. With its clear separation between algorithm and execution policies, it provides researchers with a simple and powerful platform for prototyping ideas, evaluating them on different parallel computing architectures and hardware platforms, and generating compact and efficient production code. A C-API is included for customization and data loading in high-level languages. POLO enables users to move seamlessly from serial to multi-threaded shared-memory and multi-node distributed-memory executors. We demonstrate how POLO allows users to implement state-of-the-art asynchronous parallel optimization algorithms in just a few lines of code and report experiment results from shared and distributed-memory computing architectures. We provide both POLO and POLO.jl, a wrapper around POLO written in the Julia language, at https://github.com/pologrp under the permissive MIT license.Comment: 25 pages, 7 figure

    Achieving Geometric Convergence for Distributed Optimization over Time-Varying Graphs

    Full text link
    This paper considers the problem of distributed optimization over time-varying graphs. For the case of undirected graphs, we introduce a distributed algorithm, referred to as DIGing, based on a combination of a distributed inexact gradient method and a gradient tracking technique. The DIGing algorithm uses doubly stochastic mixing matrices and employs fixed step-sizes and, yet, drives all the agents' iterates to a global and consensual minimizer. When the graphs are directed, in which case the implementation of doubly stochastic mixing matrices is unrealistic, we construct an algorithm that incorporates the push-sum protocol into the DIGing structure, thus obtaining Push-DIGing algorithm. The Push-DIGing uses column stochastic matrices and fixed step-sizes, but it still converges to a global and consensual minimizer. Under the strong convexity assumption, we prove that the algorithms converge at R-linear (geometric) rates as long as the step-sizes do not exceed some upper bounds. We establish explicit estimates for the convergence rates. When the graph is undirected it shows that DIGing scales polynomially in the number of agents. We also provide some numerical experiments to demonstrate the efficacy of the proposed algorithms and to validate our theoretical findings

    An Asynchronous Distributed Framework for Large-scale Learning Based on Parameter Exchanges

    Full text link
    In many distributed learning problems, the heterogeneous loading of computing machines may harm the overall performance of synchronous strategies. In this paper, we propose an effective asynchronous distributed framework for the minimization of a sum of smooth functions, where each machine performs iterations in parallel on its local function and updates a shared parameter asynchronously. In this way, all machines can continuously work even though they do not have the latest version of the shared parameter. We prove the convergence of the consistency of this general distributed asynchronous method for gradient iterations then show its efficiency on the matrix factorization problem for recommender systems and on binary classification.Comment: 16 page

    Nonasymptotic convergence of stochastic proximal point algorithms for constrained convex optimization

    Full text link
    A very popular approach for solving stochastic optimization problems is the stochastic gradient descent method (SGD). Although the SGD iteration is computationally cheap and the practical performance of this method may be satisfactory under certain circumstances, there is recent evidence of its convergence difficulties and instability for unappropriate parameters choice. To avoid these drawbacks naturally introduced by the SGD scheme, the stochastic proximal point algorithms have been recently considered in the literature. We introduce a new variant of the stochastic proximal point method (SPP) for solving stochastic convex optimization problems subject to (in)finite intersection of constraints satisfying a linear regularity type condition. For the newly introduced SPP scheme we prove new nonasymptotic convergence results. In particular, for convex and Lipschitz continuous objective functions, we prove nonasymptotic estimates for the rate of convergence in terms of the expected value function gap of order O(1/k1/2)\mathcal{O}(1/k^{1/2}), where kk is the iteration counter. We also derive better nonasymptotic bounds for the rate of convergence in terms of expected quadratic distance from the iterates to the optimal solution for smooth strongly convex objective functions, which in the best case is of order O(1/k)\mathcal{O}(1/k). Since these convergence rates can be attained by our SPP algorithm only under some natural restrictions on the stepsize, we also introduce a restarting variant of SPP method that overcomes these difficulties and derive the corresponding nonasymptotic convergence rates. Numerical evidence supports the effectiveness of our methods in real-world problems
    • …
    corecore