Search CORE

163 research outputs found

Variance Reduction for Faster Non-Convex Optimization

Author: Allen-Zhu Zeyuan
Hazan Elad
Publication venue
Publication date: 01/01/2016
Field of study

We consider the fundamental problem in non-convex optimization of efficiently reaching a stationary point. In contrast to the convex case, in the long history of this basic problem, the only known theoretical results on first-order non-convex optimization remain to be full gradient descent that converges in

O(1/\varepsilon)

iterations for smooth objectives, and stochastic gradient descent that converges in

O(1/\varepsilon^2)

iterations for objectives that are sum of smooth functions. We provide the first improvement in this line of research. Our result is based on the variance reduction trick recently introduced to convex optimization, as well as a brand new analysis of variance reduction that is suitable for non-convex optimization. For objectives that are sum of smooth functions, our first-order minibatch stochastic method converges with an

O(1/\varepsilon)

rate, and is faster than full gradient descent by

\Omega(n^{1/3})

. We demonstrate the effectiveness of our methods on empirical risk minimizations with non-convex loss functions and training neural nets.Comment: polished writin

arXiv.org e-Print Archive

Princeton University Open Access Repository

Momentum-Based Variance Reduction in Non-Convex SGD

Author: Cutkosky Ashok
Orabona Francesco
Publication venue
Publication date: 08/12/2019
Field of study

Variance reduction has emerged in recent years as a strong competitor to stochastic gradient descent in non-convex problems, providing the first algorithms to improve upon the converge rate of stochastic gradient descent for finding first-order critical points. However, variance reduction techniques typically require carefully tuned learning rates and willingness to use excessively large "mega-batches" in order to achieve their improved results. We present a new algorithm, STORM, that does not require any batches and makes use of adaptive learning rates, enabling simpler implementation and less hyperparameter tuning. Our technique for removing the batches uses a variant of momentum to achieve variance reduction in non-convex optimization. On smooth losses

F

, STORM finds a point

\boldsymbol{x}

with

\mathbb{E}[\|\nabla F(\boldsymbol{x})\|]\le O(1/\sqrt{T}+\sigma^{1/3}/T^{1/3})

T

iterations with

\sigma^2

variance in the gradients, matching the optimal rate but without requiring knowledge of

\sigma

.Comment: Added Ac

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)

Efficient Regret Minimization in Non-Convex Games

Author: Hazan Elad
Singh Karan
Zhang Cyril
Publication venue
Publication date: 01/01/2017
Field of study

We consider regret minimization in repeated games with non-convex loss functions. Minimizing the standard notion of regret is computationally intractable. Thus, we define a natural notion of regret which permits efficient optimization and generalizes offline guarantees for convergence to an approximate local optimum. We give gradient-based methods that achieve optimal regret, which in turn guarantee convergence to equilibrium in this framework.Comment: Published as a conference paper at ICML 201

arXiv.org e-Print Archive

Princeton University Open Access Repository