112,305 research outputs found
Minimizing Finite Sums with the Stochastic Average Gradient
We propose the stochastic average gradient (SAG) method for optimizing the
sum of a finite number of smooth convex functions. Like stochastic gradient
(SG) methods, the SAG method's iteration cost is independent of the number of
terms in the sum. However, by incorporating a memory of previous gradient
values the SAG method achieves a faster convergence rate than black-box SG
methods. The convergence rate is improved from O(1/k^{1/2}) to O(1/k) in
general, and when the sum is strongly-convex the convergence rate is improved
from the sub-linear O(1/k) to a linear convergence rate of the form O(p^k) for
p \textless{} 1. Further, in many cases the convergence rate of the new method
is also faster than black-box deterministic gradient methods, in terms of the
number of gradient evaluations. Numerical experiments indicate that the new
algorithm often dramatically outperforms existing SG and deterministic gradient
methods, and that the performance may be further improved through the use of
non-uniform sampling strategies.Comment: Revision from January 2015 submission. Major changes: updated
literature follow and discussion of subsequent work, additional Lemma showing
the validity of one of the formulas, somewhat simplified presentation of
Lyapunov bound, included code needed for checking proofs rather than the
polynomials generated by the code, added error regions to the numerical
experiment
Design of generalized fractional order gradient descent method
This paper focuses on the convergence problem of the emerging fractional
order gradient descent method, and proposes three solutions to overcome the
problem. In fact, the general fractional gradient method cannot converge to the
real extreme point of the target function, which critically hampers the
application of this method. Because of the long memory characteristics of
fractional derivative, fixed memory principle is a prior choice. Apart from the
truncation of memory length, two new methods are developed to reach the
convergence. The one is the truncation of the infinite series, and the other is
the modification of the constant fractional order. Finally, six illustrative
examples are performed to illustrate the effectiveness and practicability of
proposed methods.Comment: 8 pages, 16 figure
Non-Uniform Stochastic Average Gradient Method for Training Conditional Random Fields
We apply stochastic average gradient (SAG) algorithms for training
conditional random fields (CRFs). We describe a practical implementation that
uses structure in the CRF gradient to reduce the memory requirement of this
linearly-convergent stochastic gradient method, propose a non-uniform sampling
scheme that substantially improves practical performance, and analyze the rate
of convergence of the SAGA variant under non-uniform sampling. Our experimental
results reveal that our method often significantly outperforms existing methods
in terms of the training objective, and performs as well or better than
optimally-tuned stochastic gradient methods in terms of test error.Comment: AI/Stats 2015, 24 page
Accelerated gradient methods for total-variation-based CT image reconstruction
Total-variation (TV)-based Computed Tomography (CT) image reconstruction has
shown experimentally to be capable of producing accurate reconstructions from
sparse-view data. In particular TV-based reconstruction is very well suited for
images with piecewise nearly constant regions. Computationally, however,
TV-based reconstruction is much more demanding, especially for 3D imaging, and
the reconstruction from clinical data sets is far from being close to
real-time. This is undesirable from a clinical perspective, and thus there is
an incentive to accelerate the solution of the underlying optimization problem.
The TV reconstruction can in principle be found by any optimization method, but
in practice the large-scale systems arising in CT image reconstruction preclude
the use of memory-demanding methods such as Newton's method. The simple
gradient method has much lower memory requirements, but exhibits slow
convergence. In the present work we consider the use of two accelerated
gradient-based methods, GPBB and UPN, for reducing the number of gradient
method iterations needed to achieve a high-accuracy TV solution in CT image
reconstruction. The former incorporates several heuristics from the
optimization literature such as Barzilai-Borwein (BB) step size selection and
nonmonotone line search. The latter uses a cleverly chosen sequence of
auxiliary points to achieve a better convergence rate. The methods are memory
efficient and equipped with a stopping criterion to ensure that the TV
reconstruction has indeed been found. An implementation of the methods (in C
with interface to Matlab) is available for download from
http://www2.imm.dtu.dk/~pch/TVReg/. We compare the proposed methods with the
standard gradient method, applied to a 3D test problem with synthetic few-view
data. We find experimentally that for realistic parameters the proposed methods
significantly outperform the gradient method.Comment: 4 pages, 2 figure
- …