1,893 research outputs found

    On the Convergence of (Stochastic) Gradient Descent with Extrapolation for Non-Convex Optimization

    Full text link
    Extrapolation is a well-known technique for solving convex optimization and variational inequalities and recently attracts some attention for non-convex optimization. Several recent works have empirically shown its success in some machine learning tasks. However, it has not been analyzed for non-convex minimization and there still remains a gap between the theory and the practice. In this paper, we analyze gradient descent and stochastic gradient descent with extrapolation for finding an approximate first-order stationary point in smooth non-convex optimization problems. Our convergence upper bounds show that the algorithms with extrapolation can be accelerated than without extrapolation

    Catalyst Acceleration for Gradient-Based Non-Convex Optimization

    Get PDF
    We introduce a generic scheme to solve nonconvex optimization problems using gradient-based algorithms originally designed for minimizing convex functions. Even though these methods may originally require convexity to operate, the proposed approach allows one to use them on weakly convex objectives, which covers a large class of non-convex functions typically appearing in machine learning and signal processing. In general, the scheme is guaranteed to produce a stationary point with a worst-case efficiency typical of first-order methods, and when the objective turns out to be convex, it automatically accelerates in the sense of Nesterov and achieves near-optimal convergence rate in function values. These properties are achieved without assuming any knowledge about the convexity of the objective, by automatically adapting to the unknown weak convexity constant. We conclude the paper by showing promising experimental results obtained by applying our approach to incremental algorithms such as SVRG and SAGA for sparse matrix factorization and for learning neural networks

    Catalyst Acceleration for First-order Convex Optimization: from Theory to Practice

    Full text link
    We introduce a generic scheme for accelerating gradient-based optimization methods in the sense of Nesterov. The approach, called Catalyst, builds upon the inexact accelerated proximal point algorithm for minimizing a convex objective function, and consists of approximately solving a sequence of well-chosen auxiliary problems, leading to faster convergence. One of the keys to achieve acceleration in theory and in practice is to solve these sub-problems with appropriate accuracy by using the right stopping criterion and the right warm-start strategy. We give practical guidelines to use Catalyst and present a comprehensive analysis of its global complexity. We show that Catalyst applies to a large class of algorithms, including gradient descent, block coordinate descent, incremental algorithms such as SAG, SAGA, SDCA, SVRG, MISO/Finito, and their proximal variants. For all of these methods, we establish faster rates using the Catalyst acceleration, for strongly convex and non-strongly convex objectives. We conclude with extensive experiments showing that acceleration is useful in practice, especially for ill-conditioned problems.Comment: link to publisher website: http://jmlr.org/papers/volume18/17-748/17-748.pd

    Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization

    Get PDF
    We consider a generic convex optimization problem associated with regularized empirical risk minimization of linear predictors. The problem structure allows us to reformulate it as a convex-concave saddle point problem. We propose a stochastic primal-dual coordinate (SPDC) method, which alternates between maximizing over a randomly chosen dual variable and minimizing over the primal variable. An extrapolation step on the primal variable is performed to obtain accelerated convergence rate. We also develop a mini-batch version of the SPDC method which facilitates parallel computing, and an extension with weighted sampling probabilities on the dual variables, which has a better complexity than uniform sampling on unnormalized data. Both theoretically and empirically, we show that the SPDC method has comparable or better performance than several state-of-the-art optimization methods

    Bregman Proximal Gradient Algorithm with Extrapolation for a class of Nonconvex Nonsmooth Minimization Problems

    Get PDF
    In this paper, we consider an accelerated method for solving nonconvex and nonsmooth minimization problems. We propose a Bregman Proximal Gradient algorithm with extrapolation(BPGe). This algorithm extends and accelerates the Bregman Proximal Gradient algorithm (BPG), which circumvents the restrictive global Lipschitz gradient continuity assumption needed in Proximal Gradient algorithms (PG). The BPGe algorithm has higher generality than the recently introduced Proximal Gradient algorithm with extrapolation(PGe), and besides, due to the extrapolation step, BPGe converges faster than BPG algorithm. Analyzing the convergence, we prove that any limit point of the sequence generated by BPGe is a stationary point of the problem by choosing parameters properly. Besides, assuming Kurdyka-{\'L}ojasiewicz property, we prove the whole sequences generated by BPGe converges to a stationary point. Finally, to illustrate the potential of the new method BPGe, we apply it to two important practical problems that arise in many fundamental applications (and that not satisfy global Lipschitz gradient continuity assumption): Poisson linear inverse problems and quadratic inverse problems. In the tests the accelerated BPGe algorithm shows faster convergence results, giving an interesting new algorithm.Comment: Preprint submitted for publication, February 14, 201
    • …
    corecore