10,822 research outputs found
Bridging the Gap between Constant Step Size Stochastic Gradient Descent and Markov Chains
We consider the minimization of an objective function given access to
unbiased estimates of its gradient through stochastic gradient descent (SGD)
with constant step-size. While the detailed analysis was only performed for
quadratic functions, we provide an explicit asymptotic expansion of the moments
of the averaged SGD iterates that outlines the dependence on initial
conditions, the effect of noise and the step-size, as well as the lack of
convergence in the general (non-quadratic) case. For this analysis, we bring
tools from Markov chain theory into the analysis of stochastic gradient. We
then show that Richardson-Romberg extrapolation may be used to get closer to
the global optimum and we show empirical improvements of the new extrapolation
scheme
Probabilistic Line Searches for Stochastic Optimization
In deterministic optimization, line searches are a standard tool ensuring
stability and efficiency. Where only stochastic gradients are available, no
direct equivalent has so far been formulated, because uncertain gradients do
not allow for a strict sequence of decisions collapsing the search space. We
construct a probabilistic line search by combining the structure of existing
deterministic methods with notions from Bayesian optimization. Our method
retains a Gaussian process surrogate of the univariate optimization objective,
and uses a probabilistic belief over the Wolfe conditions to monitor the
descent. The algorithm has very low computational cost, and no user-controlled
parameters. Experiments show that it effectively removes the need to define a
learning rate for stochastic gradient descent.Comment: Extended version of the NIPS '15 conference paper, includes detailed
pseudo-code, 59 pages, 35 figure
Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization
We consider a generic convex optimization problem associated with regularized
empirical risk minimization of linear predictors. The problem structure allows
us to reformulate it as a convex-concave saddle point problem. We propose a
stochastic primal-dual coordinate (SPDC) method, which alternates between
maximizing over a randomly chosen dual variable and minimizing over the primal
variable. An extrapolation step on the primal variable is performed to obtain
accelerated convergence rate. We also develop a mini-batch version of the SPDC
method which facilitates parallel computing, and an extension with weighted
sampling probabilities on the dual variables, which has a better complexity
than uniform sampling on unnormalized data. Both theoretically and empirically,
we show that the SPDC method has comparable or better performance than several
state-of-the-art optimization methods
- …