We consider the minimization of an objective function given access to
unbiased estimates of its gradient through stochastic gradient descent (SGD)
with constant step-size. While the detailed analysis was only performed for
quadratic functions, we provide an explicit asymptotic expansion of the moments
of the averaged SGD iterates that outlines the dependence on initial
conditions, the effect of noise and the step-size, as well as the lack of
convergence in the general (non-quadratic) case. For this analysis, we bring
tools from Markov chain theory into the analysis of stochastic gradient. We
then show that Richardson-Romberg extrapolation may be used to get closer to
the global optimum and we show empirical improvements of the new extrapolation
scheme