11,029 research outputs found
Stochastic Majorization-Minimization Algorithms for Large-Scale Optimization
Majorization-minimization algorithms consist of iteratively minimizing a
majorizing surrogate of an objective function. Because of its simplicity and
its wide applicability, this principle has been very popular in statistics and
in signal processing. In this paper, we intend to make this principle scalable.
We introduce a stochastic majorization-minimization scheme which is able to
deal with large-scale or possibly infinite data sets. When applied to convex
optimization problems under suitable assumptions, we show that it achieves an
expected convergence rate of after iterations, and of
for strongly convex functions. Equally important, our scheme almost
surely converges to stationary points for a large class of non-convex problems.
We develop several efficient algorithms based on our framework. First, we
propose a new stochastic proximal gradient method, which experimentally matches
state-of-the-art solvers for large-scale -logistic regression. Second,
we develop an online DC programming algorithm for non-convex sparse estimation.
Finally, we demonstrate the effectiveness of our approach for solving
large-scale structured matrix factorization problems.Comment: accepted for publication for Neural Information Processing Systems
(NIPS) 2013. This is the 9-pages version followed by 16 pages of appendices.
The title has changed compared to the first technical repor
Catalyst Acceleration for Gradient-Based Non-Convex Optimization
We introduce a generic scheme to solve nonconvex optimization problems using
gradient-based algorithms originally designed for minimizing convex functions.
Even though these methods may originally require convexity to operate, the
proposed approach allows one to use them on weakly convex objectives, which
covers a large class of non-convex functions typically appearing in machine
learning and signal processing. In general, the scheme is guaranteed to produce
a stationary point with a worst-case efficiency typical of first-order methods,
and when the objective turns out to be convex, it automatically accelerates in
the sense of Nesterov and achieves near-optimal convergence rate in function
values. These properties are achieved without assuming any knowledge about the
convexity of the objective, by automatically adapting to the unknown weak
convexity constant. We conclude the paper by showing promising experimental
results obtained by applying our approach to incremental algorithms such as
SVRG and SAGA for sparse matrix factorization and for learning neural networks
- …