Search CORE

215 research outputs found

Tracking the gradients using the Hessian: A new look at variance reducing stochastic methods

Author: Bach Francis
Gower Robert M.
Roux Nicolas Le
Publication venue
Publication date: 01/01/2018
Field of study

Our goal is to improve variance reducing stochastic methods through better control variates. We first propose a modification of SVRG which uses the Hessian to track gradients over time, rather than to recondition, increasing the correlation of the control variates and leading to faster theoretical convergence close to the optimum. We then propose accurate and computationally efficient approximations to the Hessian, both using a diagonal and a low-rank matrix. Finally, we demonstrate the effectiveness of our method on a wide range of problems.Comment: 17 pages, 2 figures, 1 tabl

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Linearly Convergent Randomized Iterative Methods for Computing the Pseudoinverse

Author: Gower Robert M.
Richtárik Peter
Publication venue
Publication date: 19/12/2016
Field of study

28 pages, 10 figuresWe develop the first stochastic incremental method for calculating the Moore-Penrose pseu-doinverse of a real matrix. By leveraging three alternative characterizations of pseudoinverse matrices, we design three methods for calculating the pseudoinverse: two general purpose methods and one specialized to symmetric matrices. The two general purpose methods are proven to converge linearly to the pseudoinverse of any given matrix. For calculating the pseudoinverse of full rank matrices we present two additional specialized methods which enjoy a faster convergence rate than the general purpose methods. We also indicate how to develop randomized methods for calculating approximate range space projections, a much needed tool in inexact Newton type methods or quadratic solvers when linear constraints are present. Finally, we present numerical experiments of our general purpose methods for calculating pseudoinverses and show that our methods greatly outperform the Newton-Schulz method on large dimensional matrices

INRIA a CCSD electronic archive server

Edinburgh Research Explorer

Handbook of Convergence Theorems for (Stochastic) Gradient Methods

Author: Garrigos Guillaume
Gower Robert M.
Publication venue
Publication date: 09/03/2024
Field of study

This is a handbook of simple proofs of the convergence of gradient and stochastic gradient descent type methods. We consider functions that are Lipschitz, smooth, convex, strongly convex, and/or Polyak-{\L}ojasiewicz functions. Our focus is on ``good proofs'' that are also simple. Each section can be consulted separately. We start with proofs of gradient descent, then on stochastic variants, including minibatching and momentum. Then move on to nonsmooth problems with the subgradient method, the proximal gradient descent and their stochastic variants. Our focus is on global convergence rates and complexity rates. Some slightly less common proofs found here include that of SGD (Stochastic gradient descent) with a proximal step, with momentum, and with mini-batching without replacement.Comment: From v2 to v3: Added new sections about SSP (Stochastic Proximal Point) and SPS (Stochastic Polyak Stepsize). Added proof for SGD for nonconvex functions. Simplified some statements for SGD. Corrected various errors and misprint

arXiv.org e-Print Archive

Computing the Sparsity Pattern of Hessians using Automatic Differentiation

Author: Gay D. M.
Griewank Andreas
Margarida Pinheiro Mello
Robert Mansel Gower
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

Crossref

Edinburgh Research Explorer

Factorial Powers for Stochastic Optimization

Author: Defazio Aaron
Gower Robert M.
Publication venue
Publication date: 01/06/2020
Field of study

The convergence rates for convex and non-convex optimization methods depend on the choice of a host of constants, including step sizes, Lyapunov function constants and momentum constants. In this work we propose the use of factorial powers as a flexible tool for defining constants that appear in convergence proofs. We list a number of remarkable properties that these sequences enjoy, and show how they can be applied to convergence proofs to simplify or improve the convergence rates of the momentum method, accelerated gradient and the stochastic variance reduced method (SVRG)

arXiv.org e-Print Archive