151 research outputs found
Importance Sampling for Minibatches
Minibatching is a very well studied and highly popular technique in
supervised learning, used by practitioners due to its ability to accelerate
training through better utilization of parallel processing power and reduction
of stochastic variance. Another popular technique is importance sampling -- a
strategy for preferential sampling of more important examples also capable of
accelerating the training process. However, despite considerable effort by the
community in these areas, and due to the inherent technical difficulty of the
problem, there is no existing work combining the power of importance sampling
with the strength of minibatching. In this paper we propose the first {\em
importance sampling for minibatches} and give simple and rigorous complexity
analysis of its performance. We illustrate on synthetic problems that for
training data of certain properties, our sampling can lead to several orders of
magnitude improvement in training time. We then test the new sampling on
several popular datasets, and show that the improvement can reach an order of
magnitude
Convergence analysis of stochastic higher-order majorization-minimization algorithms
Majorization-minimization schemes are a broad class of iterative methods
targeting general optimization problems, including nonconvex, nonsmooth and
stochastic. These algorithms minimize successively a sequence of upper bounds
of the objective function so that along the iterations the objective value
decreases. We present a stochastic higher-order algorithmic framework for
minimizing the average of a very large number of sufficiently smooth functions.
Our stochastic framework is based on the notion of stochastic higher-order
upper bound approximations of the finite-sum objective function and
minibatching. We derive convergence results for nonconvex and convex
optimization problems when the higher-order approximation of the objective
function yields an error that is p times differentiable and has Lipschitz
continuous p derivative. More precisely, for general nonconvex problems we
present asymptotic stationary point guarantees and under Kurdyka-Lojasiewicz
property we derive local convergence rates ranging from sublinear to linear.
For convex problems with uniformly convex objective function we derive local
(super)linear convergence results for our algorithm. Compared to existing
stochastic (first-order) methods, our algorithm adapts to the problem's
curvature and allows using any batch size. Preliminary numerical tests support
the effectiveness of our algorithmic framework.Comment: 28 page
Cyclic Block Coordinate Descent With Variance Reduction for Composite Nonconvex Optimization
Nonconvex optimization is central in solving many machine learning problems,
in which block-wise structure is commonly encountered. In this work, we propose
cyclic block coordinate methods for nonconvex optimization problems with
non-asymptotic gradient norm guarantees. Our convergence analysis is based on a
gradient Lipschitz condition with respect to a Mahalanobis norm, inspired by a
recent progress on cyclic block coordinate methods. In deterministic settings,
our convergence guarantee matches the guarantee of (full-gradient) gradient
descent, but with the gradient Lipschitz constant being defined w.r.t.~the
Mahalanobis norm. In stochastic settings, we use recursive variance reduction
to decrease the per-iteration cost and match the arithmetic operation
complexity of current optimal stochastic full-gradient methods, with a unified
analysis for both finite-sum and infinite-sum cases. We further prove the
faster, linear convergence of our methods when a Polyak-{\L}ojasiewicz (P{\L})
condition holds for the objective function. To the best of our knowledge, our
work is the first to provide variance-reduced convergence guarantees for a
cyclic block coordinate method. Our experimental results demonstrate the
efficacy of the proposed variance-reduced cyclic scheme in training deep neural
nets
- …