83,000 research outputs found
Logistic Regression: Tight Bounds for Stochastic and Online Optimization
The logistic loss function is often advocated in machine learning and
statistics as a smooth and strictly convex surrogate for the 0-1 loss. In this
paper we investigate the question of whether these smoothness and convexity
properties make the logistic loss preferable to other widely considered options
such as the hinge loss. We show that in contrast to known asymptotic bounds, as
long as the number of prediction/optimization iterations is sub exponential,
the logistic loss provides no improvement over a generic non-smooth loss
function such as the hinge loss. In particular we show that the convergence
rate of stochastic logistic optimization is bounded from below by a polynomial
in the diameter of the decision set and the number of prediction iterations,
and provide a matching tight upper bound. This resolves the COLT open problem
of McMahan and Streeter (2012)
Variance Reduction for Faster Non-Convex Optimization
We consider the fundamental problem in non-convex optimization of efficiently
reaching a stationary point. In contrast to the convex case, in the long
history of this basic problem, the only known theoretical results on
first-order non-convex optimization remain to be full gradient descent that
converges in iterations for smooth objectives, and
stochastic gradient descent that converges in iterations
for objectives that are sum of smooth functions.
We provide the first improvement in this line of research. Our result is
based on the variance reduction trick recently introduced to convex
optimization, as well as a brand new analysis of variance reduction that is
suitable for non-convex optimization. For objectives that are sum of smooth
functions, our first-order minibatch stochastic method converges with an
rate, and is faster than full gradient descent by
.
We demonstrate the effectiveness of our methods on empirical risk
minimizations with non-convex loss functions and training neural nets.Comment: polished writin
Online Stochastic Gradient Descent with Arbitrary Initialization Solves Non-smooth, Non-convex Phase Retrieval
In recent literature, a general two step procedure has been formulated for
solving the problem of phase retrieval. First, a spectral technique is used to
obtain a constant-error initial estimate, following which, the estimate is
refined to arbitrary precision by first-order optimization of a non-convex loss
function. Numerical experiments, however, seem to suggest that simply running
the iterative schemes from a random initialization may also lead to
convergence, albeit at the cost of slightly higher sample complexity. In this
paper, we prove that, in fact, constant step size online stochastic gradient
descent (SGD) converges from arbitrary initializations for the non-smooth,
non-convex amplitude squared loss objective. In this setting, online SGD is
also equivalent to the randomized Kaczmarz algorithm from numerical analysis.
Our analysis can easily be generalized to other single index models. It also
makes use of new ideas from stochastic process theory, including the notion of
a summary state space, which we believe will be of use for the broader field of
non-convex optimization
Perturbed Iterate SGD for Lipschitz Continuous Loss Functions
This paper presents an extension of stochastic gradient descent for the
minimization of Lipschitz continuous loss functions. Using the Clarke
-subdifferential, we prove non-asymptotic convergence bounds to an
approximate stationary point in expectation. Our results hold under the
assumption that the stochastic loss function is a Carath\'eodory function which
is almost everywhere Lipschitz continuous in the decision variables. To the
best of our knowledge this is the first non-asymptotic convergence analysis
under these minimal assumptions. Our motivation is for use in non-convex
non-smooth stochastic optimization problems, which are frequently encountered
in applications such as machine learning. We present numerical results from
training a feedforward neural network, comparing our algorithm to stochastic
gradient descent
- …