215 research outputs found
Tracking the gradients using the Hessian: A new look at variance reducing stochastic methods
Our goal is to improve variance reducing stochastic methods through better
control variates. We first propose a modification of SVRG which uses the
Hessian to track gradients over time, rather than to recondition, increasing
the correlation of the control variates and leading to faster theoretical
convergence close to the optimum. We then propose accurate and computationally
efficient approximations to the Hessian, both using a diagonal and a low-rank
matrix. Finally, we demonstrate the effectiveness of our method on a wide range
of problems.Comment: 17 pages, 2 figures, 1 tabl
Linearly Convergent Randomized Iterative Methods for Computing the Pseudoinverse
28 pages, 10 figuresWe develop the first stochastic incremental method for calculating the Moore-Penrose pseu-doinverse of a real matrix. By leveraging three alternative characterizations of pseudoinverse matrices, we design three methods for calculating the pseudoinverse: two general purpose methods and one specialized to symmetric matrices. The two general purpose methods are proven to converge linearly to the pseudoinverse of any given matrix. For calculating the pseudoinverse of full rank matrices we present two additional specialized methods which enjoy a faster convergence rate than the general purpose methods. We also indicate how to develop randomized methods for calculating approximate range space projections, a much needed tool in inexact Newton type methods or quadratic solvers when linear constraints are present. Finally, we present numerical experiments of our general purpose methods for calculating pseudoinverses and show that our methods greatly outperform the Newton-Schulz method on large dimensional matrices
Handbook of Convergence Theorems for (Stochastic) Gradient Methods
This is a handbook of simple proofs of the convergence of gradient and
stochastic gradient descent type methods. We consider functions that are
Lipschitz, smooth, convex, strongly convex, and/or Polyak-{\L}ojasiewicz
functions. Our focus is on ``good proofs'' that are also simple. Each section
can be consulted separately. We start with proofs of gradient descent, then on
stochastic variants, including minibatching and momentum. Then move on to
nonsmooth problems with the subgradient method, the proximal gradient descent
and their stochastic variants. Our focus is on global convergence rates and
complexity rates. Some slightly less common proofs found here include that of
SGD (Stochastic gradient descent) with a proximal step, with momentum, and with
mini-batching without replacement.Comment: From v2 to v3: Added new sections about SSP (Stochastic Proximal
Point) and SPS (Stochastic Polyak Stepsize). Added proof for SGD for
nonconvex functions. Simplified some statements for SGD. Corrected various
errors and misprint
Factorial Powers for Stochastic Optimization
The convergence rates for convex and non-convex optimization methods depend
on the choice of a host of constants, including step sizes, Lyapunov function
constants and momentum constants. In this work we propose the use of factorial
powers as a flexible tool for defining constants that appear in convergence
proofs. We list a number of remarkable properties that these sequences enjoy,
and show how they can be applied to convergence proofs to simplify or improve
the convergence rates of the momentum method, accelerated gradient and the
stochastic variance reduced method (SVRG)
- …