3,361 research outputs found
Accelerating Incremental Gradient Optimization with Curvature Information
This paper studies an acceleration technique for incremental aggregated
gradient ({\sf IAG}) method through the use of \emph{curvature} information for
solving strongly convex finite sum optimization problems. These optimization
problems of interest arise in large-scale learning applications. Our technique
utilizes a curvature-aided gradient tracking step to produce accurate gradient
estimates incrementally using Hessian information. We propose and analyze two
methods utilizing the new technique, the curvature-aided IAG ({\sf CIAG})
method and the accelerated CIAG ({\sf A-CIAG}) method, which are analogous to
gradient method and Nesterov's accelerated gradient method, respectively.
Setting to be the condition number of the objective function, we prove
the linear convergence rates of for
the {\sf CIAG} method, and for the {\sf
A-CIAG} method, where are constants inversely proportional to
the distance between the initial point and the optimal solution. When the
initial iterate is close to the optimal solution, the linear convergence
rates match with the gradient and accelerated gradient method, albeit {\sf
CIAG} and {\sf A-CIAG} operate in an incremental setting with strictly lower
computation complexity. Numerical experiments confirm our findings. The source
codes used for this paper can be found on
\url{http://github.com/hoitowai/ciag/}.Comment: 22 pages, 3 figures, 3 tables. Accepted by Computational Optimization
and Applications, to appea
Semistochastic Quadratic Bound Methods
Partition functions arise in a variety of settings, including conditional
random fields, logistic regression, and latent gaussian models. In this paper,
we consider semistochastic quadratic bound (SQB) methods for maximum likelihood
inference based on partition function optimization. Batch methods based on the
quadratic bound were recently proposed for this class of problems, and
performed favorably in comparison to state-of-the-art techniques.
Semistochastic methods fall in between batch algorithms, which use all the
data, and stochastic gradient type methods, which use small random selections
at each iteration. We build semistochastic quadratic bound-based methods, and
prove both global convergence (to a stationary point) under very weak
assumptions, and linear convergence rate under stronger assumptions on the
objective. To make the proposed methods faster and more stable, we consider
inexact subproblem minimization and batch-size selection schemes. The efficacy
of SQB methods is demonstrated via comparison with several state-of-the-art
techniques on commonly used datasets.Comment: 11 pages, 1 figur
A Lower Bound for the Optimization of Finite Sums
This paper presents a lower bound for optimizing a finite sum of
functions, where each function is -smooth and the sum is -strongly
convex. We show that no algorithm can reach an error in minimizing
all functions from this class in fewer than iterations, where is a
surrogate condition number. We then compare this lower bound to upper bounds
for recently developed methods specializing to this setting. When the functions
involved in this sum are not arbitrary, but based on i.i.d. random data, then
we further contrast these complexity results with those for optimal first-order
methods to directly optimize the sum. The conclusion we draw is that a lot of
caution is necessary for an accurate comparison, and identify machine learning
scenarios where the new methods help computationally.Comment: Added an erratum, we are currently working on extending the result to
randomized algorithm
Tracking the gradients using the Hessian: A new look at variance reducing stochastic methods
Our goal is to improve variance reducing stochastic methods through better
control variates. We first propose a modification of SVRG which uses the
Hessian to track gradients over time, rather than to recondition, increasing
the correlation of the control variates and leading to faster theoretical
convergence close to the optimum. We then propose accurate and computationally
efficient approximations to the Hessian, both using a diagonal and a low-rank
matrix. Finally, we demonstrate the effectiveness of our method on a wide range
of problems.Comment: 17 pages, 2 figures, 1 tabl
- …