Majorization-minimization algorithms consist of iteratively minimizing a
majorizing surrogate of an objective function. Because of its simplicity and
its wide applicability, this principle has been very popular in statistics and
in signal processing. In this paper, we intend to make this principle scalable.
We introduce a stochastic majorization-minimization scheme which is able to
deal with large-scale or possibly infinite data sets. When applied to convex
optimization problems under suitable assumptions, we show that it achieves an
expected convergence rate of O(1/n) after n iterations, and of
O(1/n) for strongly convex functions. Equally important, our scheme almost
surely converges to stationary points for a large class of non-convex problems.
We develop several efficient algorithms based on our framework. First, we
propose a new stochastic proximal gradient method, which experimentally matches
state-of-the-art solvers for large-scale ℓ1-logistic regression. Second,
we develop an online DC programming algorithm for non-convex sparse estimation.
Finally, we demonstrate the effectiveness of our approach for solving
large-scale structured matrix factorization problems.Comment: accepted for publication for Neural Information Processing Systems
(NIPS) 2013. This is the 9-pages version followed by 16 pages of appendices.
The title has changed compared to the first technical repor