17,008 research outputs found

    A unified variance-reduced accelerated gradient method for convex optimization

    Full text link
    We propose a novel randomized incremental gradient algorithm, namely, VAriance-Reduced Accelerated Gradient (Varag), for finite-sum optimization. Equipped with a unified step-size policy that adjusts itself to the value of the condition number, Varag exhibits the unified optimal rates of convergence for solving smooth convex finite-sum problems directly regardless of their strong convexity. Moreover, Varag is the first accelerated randomized incremental gradient method that benefits from the strong convexity of the data-fidelity term to achieve the optimal linear convergence. It also establishes an optimal linear rate of convergence for solving a wide class of problems only satisfying a certain error bound condition rather than strong convexity. Varag can also be extended to solve stochastic finite-sum problems.Comment: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019

    Incremental Stochastic Subgradient Algorithms for Convex Optimization

    Full text link
    In this paper we study the effect of stochastic errors on two constrained incremental sub-gradient algorithms. We view the incremental sub-gradient algorithms as decentralized network optimization algorithms as applied to minimize a sum of functions, when each component function is known only to a particular agent of a distributed network. We first study the standard cyclic incremental sub-gradient algorithm in which the agents form a ring structure and pass the iterate in a cycle. We consider the method with stochastic errors in the sub-gradient evaluations and provide sufficient conditions on the moments of the stochastic errors that guarantee almost sure convergence when a diminishing step-size is used. We also obtain almost sure bounds on the algorithm's performance when a constant step-size is used. We then consider \ram{the} Markov randomized incremental subgradient method, which is a non-cyclic version of the incremental algorithm where the sequence of computing agents is modeled as a time non-homogeneous Markov chain. Such a model is appropriate for mobile networks, as the network topology changes across time in these networks. We establish the convergence results and error bounds for the Markov randomized method in the presence of stochastic errors for diminishing and constant step-sizes, respectively

    On the linear convergence of the stochastic gradient method with constant step-size

    Get PDF
    The strong growth condition (SGC) is known to be a sufficient condition for linear convergence of the stochastic gradient method using a constant step-size γ\gamma (SGM-CS). In this paper, we provide a necessary condition, for the linear convergence of SGM-CS, that is weaker than SGC. Moreover, when this necessary is violated up to a additive perturbation σ\sigma, we show that both the projected stochastic gradient method using a constant step-size (PSGM-CS) and the proximal stochastic gradient method exhibit linear convergence to a noise dominated region, whose distance to the optimal solution is proportional to γσ\gamma \sigma

    Catalyst Acceleration for First-order Convex Optimization: from Theory to Practice

    Full text link
    We introduce a generic scheme for accelerating gradient-based optimization methods in the sense of Nesterov. The approach, called Catalyst, builds upon the inexact accelerated proximal point algorithm for minimizing a convex objective function, and consists of approximately solving a sequence of well-chosen auxiliary problems, leading to faster convergence. One of the keys to achieve acceleration in theory and in practice is to solve these sub-problems with appropriate accuracy by using the right stopping criterion and the right warm-start strategy. We give practical guidelines to use Catalyst and present a comprehensive analysis of its global complexity. We show that Catalyst applies to a large class of algorithms, including gradient descent, block coordinate descent, incremental algorithms such as SAG, SAGA, SDCA, SVRG, MISO/Finito, and their proximal variants. For all of these methods, we establish faster rates using the Catalyst acceleration, for strongly convex and non-strongly convex objectives. We conclude with extensive experiments showing that acceleration is useful in practice, especially for ill-conditioned problems.Comment: link to publisher website: http://jmlr.org/papers/volume18/17-748/17-748.pd