Search CORE

9,203 research outputs found

Catalyst Acceleration for Gradient-Based Non-Convex Optimization

Author: Drusvyatskiy Dmitriy
Harchaoui Zaid
Lin Hongzhou
Mairal Julien
Paquette Courtney
Publication venue
Publication date: 09/06/2017
Field of study

We introduce a generic scheme to solve nonconvex optimization problems using gradient-based algorithms originally designed for minimizing convex functions. Even though these methods may originally require convexity to operate, the proposed approach allows one to use them on weakly convex objectives, which covers a large class of non-convex functions typically appearing in machine learning and signal processing. In general, the scheme is guaranteed to produce a stationary point with a worst-case efficiency typical of first-order methods, and when the objective turns out to be convex, it automatically accelerates in the sense of Nesterov and achieves near-optimal convergence rate in function values. These properties are achieved without assuming any knowledge about the convexity of the objective, by automatically adapting to the unknown weak convexity constant. We conclude the paper by showing promising experimental results obtained by applying our approach to incremental algorithms such as SVRG and SAGA for sparse matrix factorization and for learning neural networks

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite-Sum Structure

Author: Bietti Alberto
Mairal Julien
Publication venue
Publication date: 23/01/2017
Field of study

Stochastic optimization algorithms with variance reduction have proven successful for minimizing large finite sums of functions. Unfortunately, these techniques are unable to deal with stochastic perturbations of input data, induced for example by data augmentation. In such cases, the objective is no longer a finite sum, and the main candidate for optimization is the stochastic gradient descent method (SGD). In this paper, we introduce a variance reduction approach for these settings when the objective is composite and strongly convex. The convergence rate outperforms SGD with a typically much smaller constant factor, which depends on the variance of gradient estimates only due to perturbations on a single example.Comment: Advances in Neural Information Processing Systems (NIPS), Dec 2017, Long Beach, CA, United State

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

A Generic Approach for Escaping Saddle points

Author: Bach Francis
Poczos Barnabas
Reddi Sashank J
Salakhutdinov Ruslan
Smola Alexander J
Sra Suvrit
Zaheer Manzil
Publication venue
Publication date: 05/09/2017
Field of study

A central challenge to using first-order methods for optimizing nonconvex problems is the presence of saddle points. First-order methods often get stuck at saddle points, greatly deteriorating their performance. Typically, to escape from saddles one has to use second-order methods. However, most works on second-order methods rely extensively on expensive Hessian-based computations, making them impractical in large-scale settings. To tackle this challenge, we introduce a generic framework that minimizes Hessian based computations while at the same time provably converging to second-order critical points. Our framework carefully alternates between a first-order and a second-order subroutine, using the latter only close to saddle points, and yields convergence results competitive to the state-of-the-art. Empirical results suggest that our strategy also enjoys a good practical performance

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server