Search CORE

3,278 research outputs found

Minimizing Finite Sums with the Stochastic Average Gradient

Author: Bach Francis
Roux Nicolas Le
Schmidt Mark
Publication venue
Publication date: 10/05/2016
Field of study

We propose the stochastic average gradient (SAG) method for optimizing the sum of a finite number of smooth convex functions. Like stochastic gradient (SG) methods, the SAG method's iteration cost is independent of the number of terms in the sum. However, by incorporating a memory of previous gradient values the SAG method achieves a faster convergence rate than black-box SG methods. The convergence rate is improved from O(1/k^{1/2}) to O(1/k) in general, and when the sum is strongly-convex the convergence rate is improved from the sub-linear O(1/k) to a linear convergence rate of the form O(p^k) for p \textless{} 1. Further, in many cases the convergence rate of the new method is also faster than black-box deterministic gradient methods, in terms of the number of gradient evaluations. Numerical experiments indicate that the new algorithm often dramatically outperforms existing SG and deterministic gradient methods, and that the performance may be further improved through the use of non-uniform sampling strategies.Comment: Revision from January 2015 submission. Major changes: updated literature follow and discussion of subsequent work, additional Lemma showing the validity of one of the formulas, somewhat simplified presentation of Lyapunov bound, included code needed for checking proofs rather than the polynomials generated by the code, added error regions to the numerical experiment

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Accelerating Incremental Gradient Optimization with Curvature Information

Author: Nedich Angelia
Scaglione Anna
Shi Wei
Uribe Cesar A.
Wai Hoi-To
Publication venue
Publication date: 28/02/2020
Field of study

This paper studies an acceleration technique for incremental aggregated gradient ({\sf IAG}) method through the use of \emph{curvature} information for solving strongly convex finite sum optimization problems. These optimization problems of interest arise in large-scale learning applications. Our technique utilizes a curvature-aided gradient tracking step to produce accurate gradient estimates incrementally using Hessian information. We propose and analyze two methods utilizing the new technique, the curvature-aided IAG ({\sf CIAG}) method and the accelerated CIAG ({\sf A-CIAG}) method, which are analogous to gradient method and Nesterov's accelerated gradient method, respectively. Setting

\kappa

to be the condition number of the objective function, we prove the

R

linear convergence rates of

1 - \frac{4c_0 \kappa}{(\kappa+1)^2}

for the {\sf CIAG} method, and

1 - \sqrt{\frac{c_1}{2\kappa}}

for the {\sf A-CIAG} method, where

c_0,c_1 \leq 1

are constants inversely proportional to the distance between the initial point and the optimal solution. When the initial iterate is close to the optimal solution, the

R

linear convergence rates match with the gradient and accelerated gradient method, albeit {\sf CIAG} and {\sf A-CIAG} operate in an incremental setting with strictly lower computation complexity. Numerical experiments confirm our findings. The source codes used for this paper can be found on \url{http://github.com/hoitowai/ciag/}.Comment: 22 pages, 3 figures, 3 tables. Accepted by Computational Optimization and Applications, to appea

arXiv.org e-Print Archive

DSpace@MIT

An Accelerated Decentralized Stochastic Proximal Algorithm for Finite Sums

Author: Bach Francis
Hendrikx Hadrien
Massoulie Laurent
Publication venue
Publication date: 12/06/2019
Field of study

Modern large-scale finite-sum optimization relies on two key aspects: distribution and stochastic updates. For smooth and strongly convex problems, existing decentralized algorithms are slower than modern accelerated variance-reduced stochastic algorithms when run on a single machine, and are therefore not efficient. Centralized algorithms are fast, but their scaling is limited by global aggregation steps that result in communication bottlenecks. In this work, we propose an efficient \textbf{A}ccelerated \textbf{D}ecentralized stochastic algorithm for \textbf{F}inite \textbf{S}ums named ADFS, which uses local stochastic proximal updates and randomized pairwise communications between nodes. On

n

machines, ADFS learns from

nm

samples in the same time it takes optimal algorithms to learn from

m

samples on one machine. This scaling holds until a critical network size is reached, which depends on communication delays, on the number of samples

m

, and on the network topology. We provide a theoretical analysis based on a novel augmented graph approach combined with a precise evaluation of synchronization times and an extension of the accelerated proximal coordinate gradient algorithm to arbitrary sampling. We illustrate the improvement of ADFS over state-of-the-art decentralized approaches with experiments.Comment: Code available in source files. arXiv admin note: substantial text overlap with arXiv:1901.0986

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server