39 research outputs found
Variance Reduced Stochastic Gradient Descent with Neighbors
Stochastic Gradient Descent (SGD) is a workhorse in machine learning, yet its
slow convergence can be a computational bottleneck. Variance reduction
techniques such as SAG, SVRG and SAGA have been proposed to overcome this
weakness, achieving linear convergence. However, these methods are either based
on computations of full gradients at pivot points, or on keeping per data point
corrections in memory. Therefore speed-ups relative to SGD may need a minimal
number of epochs in order to materialize. This paper investigates algorithms
that can exploit neighborhood structure in the training data to share and
re-use information about past stochastic gradients across data points, which
offers advantages in the transient optimization phase. As a side-product we
provide a unified convergence analysis for a family of variance reduction
algorithms, which we call memorization algorithms. We provide experimental
results supporting our theory.Comment: Appears in: Advances in Neural Information Processing Systems 28
(NIPS 2015). 13 page
SVAG: Stochastic Variance Adjusted Gradient Descent and Biased Stochastic Gradients
We examine biased gradient updates in variance reduced stochastic gradient
methods. For this purpose we introduce SVAG, a SAG/SAGA-like method with
adjustable bias. SVAG is analyzed under smoothness assumptions and we provide
step-size conditions for convergence that match or improve on previously known
conditions for SAG and SAGA. The analysis highlights a step-size requirement
difference between when SVAG is applied to cocoercive operators and when
applied to gradients of smooth functions, a difference not present in ordinary
gradient descent. This difference is verified with numerical experiments. A
variant of SVAG that adaptively selects the bias is presented and compared
numerically to SVAG on a set of classification problems. The adaptive SVAG
frequently performs among the best and always improves on the worst-case
performance of the non-adaptive variant
Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite-Sum Structure
Stochastic optimization algorithms with variance reduction have proven
successful for minimizing large finite sums of functions. Unfortunately, these
techniques are unable to deal with stochastic perturbations of input data,
induced for example by data augmentation. In such cases, the objective is no
longer a finite sum, and the main candidate for optimization is the stochastic
gradient descent method (SGD). In this paper, we introduce a variance reduction
approach for these settings when the objective is composite and strongly
convex. The convergence rate outperforms SGD with a typically much smaller
constant factor, which depends on the variance of gradient estimates only due
to perturbations on a single example.Comment: Advances in Neural Information Processing Systems (NIPS), Dec 2017,
Long Beach, CA, United State