83,000 research outputs found

    Logistic Regression: Tight Bounds for Stochastic and Online Optimization

    Full text link
    The logistic loss function is often advocated in machine learning and statistics as a smooth and strictly convex surrogate for the 0-1 loss. In this paper we investigate the question of whether these smoothness and convexity properties make the logistic loss preferable to other widely considered options such as the hinge loss. We show that in contrast to known asymptotic bounds, as long as the number of prediction/optimization iterations is sub exponential, the logistic loss provides no improvement over a generic non-smooth loss function such as the hinge loss. In particular we show that the convergence rate of stochastic logistic optimization is bounded from below by a polynomial in the diameter of the decision set and the number of prediction iterations, and provide a matching tight upper bound. This resolves the COLT open problem of McMahan and Streeter (2012)

    Variance Reduction for Faster Non-Convex Optimization

    Full text link
    We consider the fundamental problem in non-convex optimization of efficiently reaching a stationary point. In contrast to the convex case, in the long history of this basic problem, the only known theoretical results on first-order non-convex optimization remain to be full gradient descent that converges in O(1/ε)O(1/\varepsilon) iterations for smooth objectives, and stochastic gradient descent that converges in O(1/ε2)O(1/\varepsilon^2) iterations for objectives that are sum of smooth functions. We provide the first improvement in this line of research. Our result is based on the variance reduction trick recently introduced to convex optimization, as well as a brand new analysis of variance reduction that is suitable for non-convex optimization. For objectives that are sum of smooth functions, our first-order minibatch stochastic method converges with an O(1/ε)O(1/\varepsilon) rate, and is faster than full gradient descent by Ω(n1/3)\Omega(n^{1/3}). We demonstrate the effectiveness of our methods on empirical risk minimizations with non-convex loss functions and training neural nets.Comment: polished writin

    Online Stochastic Gradient Descent with Arbitrary Initialization Solves Non-smooth, Non-convex Phase Retrieval

    Get PDF
    In recent literature, a general two step procedure has been formulated for solving the problem of phase retrieval. First, a spectral technique is used to obtain a constant-error initial estimate, following which, the estimate is refined to arbitrary precision by first-order optimization of a non-convex loss function. Numerical experiments, however, seem to suggest that simply running the iterative schemes from a random initialization may also lead to convergence, albeit at the cost of slightly higher sample complexity. In this paper, we prove that, in fact, constant step size online stochastic gradient descent (SGD) converges from arbitrary initializations for the non-smooth, non-convex amplitude squared loss objective. In this setting, online SGD is also equivalent to the randomized Kaczmarz algorithm from numerical analysis. Our analysis can easily be generalized to other single index models. It also makes use of new ideas from stochastic process theory, including the notion of a summary state space, which we believe will be of use for the broader field of non-convex optimization

    Perturbed Iterate SGD for Lipschitz Continuous Loss Functions

    Full text link
    This paper presents an extension of stochastic gradient descent for the minimization of Lipschitz continuous loss functions. Using the Clarke ϵ\epsilon-subdifferential, we prove non-asymptotic convergence bounds to an approximate stationary point in expectation. Our results hold under the assumption that the stochastic loss function is a Carath\'eodory function which is almost everywhere Lipschitz continuous in the decision variables. To the best of our knowledge this is the first non-asymptotic convergence analysis under these minimal assumptions. Our motivation is for use in non-convex non-smooth stochastic optimization problems, which are frequently encountered in applications such as machine learning. We present numerical results from training a feedforward neural network, comparing our algorithm to stochastic gradient descent
    • …
    corecore