7,831 research outputs found
Catalyst Acceleration for First-order Convex Optimization: from Theory to Practice
We introduce a generic scheme for accelerating gradient-based optimization
methods in the sense of Nesterov. The approach, called Catalyst, builds upon
the inexact accelerated proximal point algorithm for minimizing a convex
objective function, and consists of approximately solving a sequence of
well-chosen auxiliary problems, leading to faster convergence. One of the keys
to achieve acceleration in theory and in practice is to solve these
sub-problems with appropriate accuracy by using the right stopping criterion
and the right warm-start strategy. We give practical guidelines to use Catalyst
and present a comprehensive analysis of its global complexity. We show that
Catalyst applies to a large class of algorithms, including gradient descent,
block coordinate descent, incremental algorithms such as SAG, SAGA, SDCA, SVRG,
MISO/Finito, and their proximal variants. For all of these methods, we
establish faster rates using the Catalyst acceleration, for strongly convex and
non-strongly convex objectives. We conclude with extensive experiments showing
that acceleration is useful in practice, especially for ill-conditioned
problems.Comment: link to publisher website:
http://jmlr.org/papers/volume18/17-748/17-748.pd
Catalyst Acceleration for Gradient-Based Non-Convex Optimization
We introduce a generic scheme to solve nonconvex optimization problems using
gradient-based algorithms originally designed for minimizing convex functions.
Even though these methods may originally require convexity to operate, the
proposed approach allows one to use them on weakly convex objectives, which
covers a large class of non-convex functions typically appearing in machine
learning and signal processing. In general, the scheme is guaranteed to produce
a stationary point with a worst-case efficiency typical of first-order methods,
and when the objective turns out to be convex, it automatically accelerates in
the sense of Nesterov and achieves near-optimal convergence rate in function
values. These properties are achieved without assuming any knowledge about the
convexity of the objective, by automatically adapting to the unknown weak
convexity constant. We conclude the paper by showing promising experimental
results obtained by applying our approach to incremental algorithms such as
SVRG and SAGA for sparse matrix factorization and for learning neural networks
Finito: A Faster, Permutable Incremental Gradient Method for Big Data Problems
Recent advances in optimization theory have shown that smooth strongly convex
finite sums can be minimized faster than by treating them as a black box
"batch" problem. In this work we introduce a new method in this class with a
theoretical convergence rate four times faster than existing methods, for sums
with sufficiently many terms. This method is also amendable to a sampling
without replacement scheme that in practice gives further speed-ups. We give
empirical results showing state of the art performance
On the linear convergence of the stochastic gradient method with constant step-size
The strong growth condition (SGC) is known to be a sufficient condition for
linear convergence of the stochastic gradient method using a constant step-size
(SGM-CS). In this paper, we provide a necessary condition, for the
linear convergence of SGM-CS, that is weaker than SGC. Moreover, when this
necessary is violated up to a additive perturbation , we show that both
the projected stochastic gradient method using a constant step-size (PSGM-CS)
and the proximal stochastic gradient method exhibit linear convergence to a
noise dominated region, whose distance to the optimal solution is proportional
to
Stochastic Block Mirror Descent Methods for Nonsmooth and Stochastic Optimization
In this paper, we present a new stochastic algorithm, namely the stochastic
block mirror descent (SBMD) method for solving large-scale nonsmooth and
stochastic optimization problems. The basic idea of this algorithm is to
incorporate the block-coordinate decomposition and an incremental block
averaging scheme into the classic (stochastic) mirror-descent method, in order
to significantly reduce the cost per iteration of the latter algorithm. We
establish the rate of convergence of the SBMD method along with its associated
large-deviation results for solving general nonsmooth and stochastic
optimization problems. We also introduce different variants of this method and
establish their rate of convergence for solving strongly convex, smooth, and
composite optimization problems, as well as certain nonconvex optimization
problems. To the best of our knowledge, all these developments related to the
SBMD methods are new in the stochastic optimization literature. Moreover, some
of our results also seem to be new for block coordinate descent methods for
deterministic optimization
- …