Stochastic Gradient Descent (SGD) is one of the simplest and most popular
algorithms in modern statistical and machine learning due to its computational
and memory efficiency. Various averaging schemes have been proposed to
accelerate the convergence of SGD in different settings. In this paper, we
explore a general averaging scheme for SGD. Specifically, we establish the
asymptotic normality of a broad range of weighted averaged SGD solutions and
provide asymptotically valid online inference approaches. Furthermore, we
propose an adaptive averaging scheme that exhibits both optimal statistical
rate and favorable non-asymptotic convergence, drawing insights from the
optimal weight for the linear model in terms of non-asymptotic mean squared
error (MSE)