In this paper we study several classes of stochastic optimization algorithms
enriched with heavy ball momentum. Among the methods studied are: stochastic
gradient descent, stochastic Newton, stochastic proximal point and stochastic
dual subspace ascent. This is the first time momentum variants of several of
these methods are studied. We choose to perform our analysis in a setting in
which all of the above methods are equivalent. We prove global nonassymptotic
linear convergence rates for all methods and various measures of success,
including primal function values, primal iterates (in L2 sense), and dual
function values. We also show that the primal iterates converge at an
accelerated linear rate in the L1 sense. This is the first time a linear rate
is shown for the stochastic heavy ball method (i.e., stochastic gradient
descent method with momentum). Under somewhat weaker conditions, we establish a
sublinear convergence rate for Cesaro averages of primal iterates. Moreover, we
propose a novel concept, which we call stochastic momentum, aimed at decreasing
the cost of performing the momentum step. We prove linear convergence of
several stochastic methods with stochastic momentum, and show that in some
sparse data regimes and for sufficiently small momentum parameters, these
methods enjoy better overall complexity than methods with deterministic
momentum. Finally, we perform extensive numerical testing on artificial and
real datasets, including data coming from average consensus problems.Comment: 47 pages, 7 figures, 7 table