Search CORE

83,000 research outputs found

Logistic Regression: Tight Bounds for Stochastic and Online Optimization

Author: Hazan Elad
Koren Tomer
Levy Kfir Y.
Publication venue
Publication date: 15/05/2014
Field of study

The logistic loss function is often advocated in machine learning and statistics as a smooth and strictly convex surrogate for the 0-1 loss. In this paper we investigate the question of whether these smoothness and convexity properties make the logistic loss preferable to other widely considered options such as the hinge loss. We show that in contrast to known asymptotic bounds, as long as the number of prediction/optimization iterations is sub exponential, the logistic loss provides no improvement over a generic non-smooth loss function such as the hinge loss. In particular we show that the convergence rate of stochastic logistic optimization is bounded from below by a polynomial in the diameter of the decision set and the number of prediction iterations, and provide a matching tight upper bound. This resolves the COLT open problem of McMahan and Streeter (2012)

arXiv.org e-Print Archive

CiteSeerX

Variance Reduction for Faster Non-Convex Optimization

Author: Allen-Zhu Zeyuan
Hazan Elad
Publication venue
Publication date: 01/01/2016
Field of study

We consider the fundamental problem in non-convex optimization of efficiently reaching a stationary point. In contrast to the convex case, in the long history of this basic problem, the only known theoretical results on first-order non-convex optimization remain to be full gradient descent that converges in

O(1/\varepsilon)

iterations for smooth objectives, and stochastic gradient descent that converges in

O(1/\varepsilon^2)

iterations for objectives that are sum of smooth functions. We provide the first improvement in this line of research. Our result is based on the variance reduction trick recently introduced to convex optimization, as well as a brand new analysis of variance reduction that is suitable for non-convex optimization. For objectives that are sum of smooth functions, our first-order minibatch stochastic method converges with an

O(1/\varepsilon)

rate, and is faster than full gradient descent by

\Omega(n^{1/3})

. We demonstrate the effectiveness of our methods on empirical risk minimizations with non-convex loss functions and training neural nets.Comment: polished writin

arXiv.org e-Print Archive

Princeton University Open Access Repository

Online Stochastic Gradient Descent with Arbitrary Initialization Solves Non-smooth, Non-convex Phase Retrieval

Author: Tan Yan Shuo
Vershynin Roman
Publication venue: eScholarship, University of California
Publication date: 28/10/2019
Field of study

In recent literature, a general two step procedure has been formulated for solving the problem of phase retrieval. First, a spectral technique is used to obtain a constant-error initial estimate, following which, the estimate is refined to arbitrary precision by first-order optimization of a non-convex loss function. Numerical experiments, however, seem to suggest that simply running the iterative schemes from a random initialization may also lead to convergence, albeit at the cost of slightly higher sample complexity. In this paper, we prove that, in fact, constant step size online stochastic gradient descent (SGD) converges from arbitrary initializations for the non-smooth, non-convex amplitude squared loss objective. In this setting, online SGD is also equivalent to the randomized Kaczmarz algorithm from numerical analysis. Our analysis can easily be generalized to other single index models. It also makes use of new ideas from stochastic process theory, including the notion of a summary state space, which we believe will be of use for the broader field of non-convex optimization

arXiv.org e-Print Archive

eScholarship - University of California

Perturbed Iterate SGD for Lipschitz Continuous Loss Functions

Author: Metel Michael R.
Takeda Akiko
Publication venue
Publication date: 24/01/2021
Field of study

This paper presents an extension of stochastic gradient descent for the minimization of Lipschitz continuous loss functions. Using the Clarke

\epsilon

-subdifferential, we prove non-asymptotic convergence bounds to an approximate stationary point in expectation. Our results hold under the assumption that the stochastic loss function is a Carath\'eodory function which is almost everywhere Lipschitz continuous in the decision variables. To the best of our knowledge this is the first non-asymptotic convergence analysis under these minimal assumptions. Our motivation is for use in non-convex non-smooth stochastic optimization problems, which are frequently encountered in applications such as machine learning. We present numerical results from training a feedforward neural network, comparing our algorithm to stochastic gradient descent

arXiv.org e-Print Archive