3,278 research outputs found
Minimizing Finite Sums with the Stochastic Average Gradient
We propose the stochastic average gradient (SAG) method for optimizing the
sum of a finite number of smooth convex functions. Like stochastic gradient
(SG) methods, the SAG method's iteration cost is independent of the number of
terms in the sum. However, by incorporating a memory of previous gradient
values the SAG method achieves a faster convergence rate than black-box SG
methods. The convergence rate is improved from O(1/k^{1/2}) to O(1/k) in
general, and when the sum is strongly-convex the convergence rate is improved
from the sub-linear O(1/k) to a linear convergence rate of the form O(p^k) for
p \textless{} 1. Further, in many cases the convergence rate of the new method
is also faster than black-box deterministic gradient methods, in terms of the
number of gradient evaluations. Numerical experiments indicate that the new
algorithm often dramatically outperforms existing SG and deterministic gradient
methods, and that the performance may be further improved through the use of
non-uniform sampling strategies.Comment: Revision from January 2015 submission. Major changes: updated
literature follow and discussion of subsequent work, additional Lemma showing
the validity of one of the formulas, somewhat simplified presentation of
Lyapunov bound, included code needed for checking proofs rather than the
polynomials generated by the code, added error regions to the numerical
experiment
Accelerating Incremental Gradient Optimization with Curvature Information
This paper studies an acceleration technique for incremental aggregated
gradient ({\sf IAG}) method through the use of \emph{curvature} information for
solving strongly convex finite sum optimization problems. These optimization
problems of interest arise in large-scale learning applications. Our technique
utilizes a curvature-aided gradient tracking step to produce accurate gradient
estimates incrementally using Hessian information. We propose and analyze two
methods utilizing the new technique, the curvature-aided IAG ({\sf CIAG})
method and the accelerated CIAG ({\sf A-CIAG}) method, which are analogous to
gradient method and Nesterov's accelerated gradient method, respectively.
Setting to be the condition number of the objective function, we prove
the linear convergence rates of for
the {\sf CIAG} method, and for the {\sf
A-CIAG} method, where are constants inversely proportional to
the distance between the initial point and the optimal solution. When the
initial iterate is close to the optimal solution, the linear convergence
rates match with the gradient and accelerated gradient method, albeit {\sf
CIAG} and {\sf A-CIAG} operate in an incremental setting with strictly lower
computation complexity. Numerical experiments confirm our findings. The source
codes used for this paper can be found on
\url{http://github.com/hoitowai/ciag/}.Comment: 22 pages, 3 figures, 3 tables. Accepted by Computational Optimization
and Applications, to appea
An Accelerated Decentralized Stochastic Proximal Algorithm for Finite Sums
Modern large-scale finite-sum optimization relies on two key aspects:
distribution and stochastic updates. For smooth and strongly convex problems,
existing decentralized algorithms are slower than modern accelerated
variance-reduced stochastic algorithms when run on a single machine, and are
therefore not efficient. Centralized algorithms are fast, but their scaling is
limited by global aggregation steps that result in communication bottlenecks.
In this work, we propose an efficient \textbf{A}ccelerated
\textbf{D}ecentralized stochastic algorithm for \textbf{F}inite \textbf{S}ums
named ADFS, which uses local stochastic proximal updates and randomized
pairwise communications between nodes. On machines, ADFS learns from
samples in the same time it takes optimal algorithms to learn from samples
on one machine. This scaling holds until a critical network size is reached,
which depends on communication delays, on the number of samples , and on the
network topology. We provide a theoretical analysis based on a novel augmented
graph approach combined with a precise evaluation of synchronization times and
an extension of the accelerated proximal coordinate gradient algorithm to
arbitrary sampling. We illustrate the improvement of ADFS over state-of-the-art
decentralized approaches with experiments.Comment: Code available in source files. arXiv admin note: substantial text
overlap with arXiv:1901.0986
- …