Search CORE

112,305 research outputs found

Minimizing Finite Sums with the Stochastic Average Gradient

Author: Bach Francis
Roux Nicolas Le
Schmidt Mark
Publication venue
Publication date: 10/05/2016
Field of study

We propose the stochastic average gradient (SAG) method for optimizing the sum of a finite number of smooth convex functions. Like stochastic gradient (SG) methods, the SAG method's iteration cost is independent of the number of terms in the sum. However, by incorporating a memory of previous gradient values the SAG method achieves a faster convergence rate than black-box SG methods. The convergence rate is improved from O(1/k^{1/2}) to O(1/k) in general, and when the sum is strongly-convex the convergence rate is improved from the sub-linear O(1/k) to a linear convergence rate of the form O(p^k) for p \textless{} 1. Further, in many cases the convergence rate of the new method is also faster than black-box deterministic gradient methods, in terms of the number of gradient evaluations. Numerical experiments indicate that the new algorithm often dramatically outperforms existing SG and deterministic gradient methods, and that the performance may be further improved through the use of non-uniform sampling strategies.Comment: Revision from January 2015 submission. Major changes: updated literature follow and discussion of subsequent work, additional Lemma showing the validity of one of the formulas, somewhat simplified presentation of Lyapunov bound, included code needed for checking proofs rather than the polynomials generated by the code, added error regions to the numerical experiment

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Design of generalized fractional order gradient descent method

Author: Kang Yu
Wang Yong
Wei Yiheng
Yin Weidi
Publication venue
Publication date: 16/02/2020
Field of study

This paper focuses on the convergence problem of the emerging fractional order gradient descent method, and proposes three solutions to overcome the problem. In fact, the general fractional gradient method cannot converge to the real extreme point of the target function, which critically hampers the application of this method. Because of the long memory characteristics of fractional derivative, fixed memory principle is a prior choice. Apart from the truncation of memory length, two new methods are developed to reach the convergence. The one is the truncation of the infinite series, and the other is the modification of the constant fractional order. Finally, six illustrative examples are performed to illustrate the effectiveness and practicability of proposed methods.Comment: 8 pages, 16 figure

arXiv.org e-Print Archive

Non-Uniform Stochastic Average Gradient Method for Training Conditional Random Fields

Author: Ahmed Mohamed Osama
Babanezhad Reza
Clifton Ann
Defazio Aaron
Sarkar Anoop
Schmidt Mark
Publication venue
Publication date: 16/04/2015
Field of study

We apply stochastic average gradient (SAG) algorithms for training conditional random fields (CRFs). We describe a practical implementation that uses structure in the CRF gradient to reduce the memory requirement of this linearly-convergent stochastic gradient method, propose a non-uniform sampling scheme that substantially improves practical performance, and analyze the rate of convergence of the SAGA variant under non-uniform sampling. Our experimental results reveal that our method often significantly outperforms existing methods in terms of the training objective, and performs as well or better than optimally-tuned stochastic gradient methods in terms of test error.Comment: AI/Stats 2015, 24 page

arXiv.org e-Print Archive

CiteSeerX

The Australian National University

Accelerated gradient methods for total-variation-based CT image reconstruction

Author: Hansen Per Christian
Jensen Søren Holdt
Jensen Tobias Lindstrøm
Jørgensen Jakob Heide
Pan Xiaochuan
Sidky Emil Y.
Publication venue
Publication date: 01/01/2011
Field of study

Total-variation (TV)-based Computed Tomography (CT) image reconstruction has shown experimentally to be capable of producing accurate reconstructions from sparse-view data. In particular TV-based reconstruction is very well suited for images with piecewise nearly constant regions. Computationally, however, TV-based reconstruction is much more demanding, especially for 3D imaging, and the reconstruction from clinical data sets is far from being close to real-time. This is undesirable from a clinical perspective, and thus there is an incentive to accelerate the solution of the underlying optimization problem. The TV reconstruction can in principle be found by any optimization method, but in practice the large-scale systems arising in CT image reconstruction preclude the use of memory-demanding methods such as Newton's method. The simple gradient method has much lower memory requirements, but exhibits slow convergence. In the present work we consider the use of two accelerated gradient-based methods, GPBB and UPN, for reducing the number of gradient method iterations needed to achieve a high-accuracy TV solution in CT image reconstruction. The former incorporates several heuristics from the optimization literature such as Barzilai-Borwein (BB) step size selection and nonmonotone line search. The latter uses a cleverly chosen sequence of auxiliary points to achieve a better convergence rate. The methods are memory efficient and equipped with a stopping criterion to ensure that the TV reconstruction has indeed been found. An implementation of the methods (in C with interface to Matlab) is available for download from http://www2.imm.dtu.dk/~pch/TVReg/. We compare the proposed methods with the standard gradient method, applied to a 3D test problem with synthetic few-view data. We find experimentally that for realistic parameters the proposed methods significantly outperform the gradient method.Comment: 4 pages, 2 figure

arXiv.org e-Print Archive

CiteSeerX

VBN

Online Research Database In Technology