Search CORE

13,658 research outputs found

SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives

Author: Bach Francis
Defazio Aaron
Lacoste-Julien Simon
Publication venue
Publication date: 01/11/2014
Field of study

In this work we introduce a new optimisation method called SAGA in the spirit of SAG, SDCA, MISO and SVRG, a set of recently proposed incremental gradient algorithms with fast linear convergence rates. SAGA improves on the theory behind SAG and SVRG, with better theoretical convergence rates, and has support for composite objectives where a proximal operator is used on the regulariser. Unlike SDCA, SAGA supports non-strongly convex problems directly, and is adaptive to any inherent strong convexity of the problem. We give experimental results showing the effectiveness of our method.Comment: Advances In Neural Information Processing Systems, Nov 2014, Montreal, Canad

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

Reducing Reparameterization Gradient Variance

Author: Adams Ryan P.
D'Amour Alexander
Foti Nicholas J.
Miller Andrew C.
Publication venue
Publication date: 01/01/2017
Field of study

Optimization with noisy gradients has become ubiquitous in statistics and machine learning. Reparameterization gradients, or gradient estimates computed via the "reparameterization trick," represent a class of noisy gradients often used in Monte Carlo variational inference (MCVI). However, when these gradient estimators are too noisy, the optimization procedure can be slow or fail to converge. One way to reduce noise is to use more samples for the gradient estimate, but this can be computationally expensive. Instead, we view the noisy gradient as a random variable, and form an inexpensive approximation of the generating procedure for the gradient sample. This approximation has high correlation with the noisy gradient by construction, making it a useful control variate for variance reduction. We demonstrate our approach on non-conjugate multi-level hierarchical models and a Bayesian neural net where we observed gradient variance reductions of multiple orders of magnitude (20-2,000x)

arXiv.org e-Print Archive

Princeton University Open Access Repository