4 research outputs found
On the Effectiveness of Richardson Extrapolation in Machine Learning
Richardson extrapolation is a classical technique from numerical analysis that can improve the approximation error of an estimation method by combining linearly several estimates obtained from different values of one of its hyperparameters, without the need to know in details the inner structure of the original estimation method. The main goal of this paper is to study when Richardson extrapolation can be used within machine learning, beyond the existing applications to step-size adaptations in stochastic gradient descent. We identify two situations where Richardson interpolation can be useful: (1) when the hyperparameter is the number of iterations of an existing iterative optimization algorithm, with applications to averaged gradient descent and Frank-Wolfe algorithms (where we obtain asymptotically rates of on polytopes, where is the number of iterations), and (2) when it is a regularization parameter, with applications to Nesterov smoothing techniques for minimizing non-smooth functions (where we obtain asymptotically rates close to for non-smooth functions), and ridge regression. In all these cases, we show that extrapolation techniques come with no significant loss in performance, but with sometimes strong gains, and we provide theoretical justifications based on asymptotic developments for such gains, as well as empirical illustrations on classical problems from machine learning
On Riemannian Stochastic Approximation Schemes with Fixed Step-Size
This paper studies fixed step-size stochastic approximation (SA) schemes,
including stochastic gradient schemes, in a Riemannian framework. It is
motivated by several applications, where geodesics can be computed explicitly,
and their use accelerates crude Euclidean methods. A fixed step-size scheme
defines a family of time-homogeneous Markov chains, parametrized by the
step-size. Here, using this formulation, non-asymptotic performance bounds are
derived, under Lyapunov conditions. Then, for any step-size, the corresponding
Markov chain is proved to admit a unique stationary distribution, and to be
geometrically ergodic. This result gives rise to a family of stationary
distributions indexed by the step-size, which is further shown to converge to a
Dirac measure, concentrated at the solution of the problem at hand, as the
step-size goes to 0. Finally, the asymptotic rate of this convergence is
established, through an asymptotic expansion of the bias, and a central limit
theorem.Comment: 37 pages, 4 figures, to appear in AISTAT2