1,893 research outputs found
On the Convergence of (Stochastic) Gradient Descent with Extrapolation for Non-Convex Optimization
Extrapolation is a well-known technique for solving convex optimization and
variational inequalities and recently attracts some attention for non-convex
optimization. Several recent works have empirically shown its success in some
machine learning tasks. However, it has not been analyzed for non-convex
minimization and there still remains a gap between the theory and the practice.
In this paper, we analyze gradient descent and stochastic gradient descent with
extrapolation for finding an approximate first-order stationary point in smooth
non-convex optimization problems. Our convergence upper bounds show that the
algorithms with extrapolation can be accelerated than without extrapolation
Catalyst Acceleration for Gradient-Based Non-Convex Optimization
We introduce a generic scheme to solve nonconvex optimization problems using
gradient-based algorithms originally designed for minimizing convex functions.
Even though these methods may originally require convexity to operate, the
proposed approach allows one to use them on weakly convex objectives, which
covers a large class of non-convex functions typically appearing in machine
learning and signal processing. In general, the scheme is guaranteed to produce
a stationary point with a worst-case efficiency typical of first-order methods,
and when the objective turns out to be convex, it automatically accelerates in
the sense of Nesterov and achieves near-optimal convergence rate in function
values. These properties are achieved without assuming any knowledge about the
convexity of the objective, by automatically adapting to the unknown weak
convexity constant. We conclude the paper by showing promising experimental
results obtained by applying our approach to incremental algorithms such as
SVRG and SAGA for sparse matrix factorization and for learning neural networks
Catalyst Acceleration for First-order Convex Optimization: from Theory to Practice
We introduce a generic scheme for accelerating gradient-based optimization
methods in the sense of Nesterov. The approach, called Catalyst, builds upon
the inexact accelerated proximal point algorithm for minimizing a convex
objective function, and consists of approximately solving a sequence of
well-chosen auxiliary problems, leading to faster convergence. One of the keys
to achieve acceleration in theory and in practice is to solve these
sub-problems with appropriate accuracy by using the right stopping criterion
and the right warm-start strategy. We give practical guidelines to use Catalyst
and present a comprehensive analysis of its global complexity. We show that
Catalyst applies to a large class of algorithms, including gradient descent,
block coordinate descent, incremental algorithms such as SAG, SAGA, SDCA, SVRG,
MISO/Finito, and their proximal variants. For all of these methods, we
establish faster rates using the Catalyst acceleration, for strongly convex and
non-strongly convex objectives. We conclude with extensive experiments showing
that acceleration is useful in practice, especially for ill-conditioned
problems.Comment: link to publisher website:
http://jmlr.org/papers/volume18/17-748/17-748.pd
Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization
We consider a generic convex optimization problem associated with regularized
empirical risk minimization of linear predictors. The problem structure allows
us to reformulate it as a convex-concave saddle point problem. We propose a
stochastic primal-dual coordinate (SPDC) method, which alternates between
maximizing over a randomly chosen dual variable and minimizing over the primal
variable. An extrapolation step on the primal variable is performed to obtain
accelerated convergence rate. We also develop a mini-batch version of the SPDC
method which facilitates parallel computing, and an extension with weighted
sampling probabilities on the dual variables, which has a better complexity
than uniform sampling on unnormalized data. Both theoretically and empirically,
we show that the SPDC method has comparable or better performance than several
state-of-the-art optimization methods
Bregman Proximal Gradient Algorithm with Extrapolation for a class of Nonconvex Nonsmooth Minimization Problems
In this paper, we consider an accelerated method for solving nonconvex and
nonsmooth minimization problems. We propose a Bregman Proximal Gradient
algorithm with extrapolation(BPGe). This algorithm extends and accelerates the
Bregman Proximal Gradient algorithm (BPG), which circumvents the restrictive
global Lipschitz gradient continuity assumption needed in Proximal Gradient
algorithms (PG). The BPGe algorithm has higher generality than the recently
introduced Proximal Gradient algorithm with extrapolation(PGe), and besides,
due to the extrapolation step, BPGe converges faster than BPG algorithm.
Analyzing the convergence, we prove that any limit point of the sequence
generated by BPGe is a stationary point of the problem by choosing parameters
properly. Besides, assuming Kurdyka-{\'L}ojasiewicz property, we prove the
whole sequences generated by BPGe converges to a stationary point. Finally, to
illustrate the potential of the new method BPGe, we apply it to two important
practical problems that arise in many fundamental applications (and that not
satisfy global Lipschitz gradient continuity assumption): Poisson linear
inverse problems and quadratic inverse problems. In the tests the accelerated
BPGe algorithm shows faster convergence results, giving an interesting new
algorithm.Comment: Preprint submitted for publication, February 14, 201
- …