Recent studies have provided both empirical and theoretical evidence
illustrating that heavy tails can emerge in stochastic gradient descent (SGD)
in various scenarios. Such heavy tails potentially result in iterates with
diverging variance, which hinders the use of conventional convergence analysis
techniques that rely on the existence of the second-order moments. In this
paper, we provide convergence guarantees for SGD under a state-dependent and
heavy-tailed noise with a potentially infinite variance, for a class of
strongly convex objectives. In the case where the p-th moment of the noise
exists for some p∈[1,2), we first identify a condition on the Hessian,
coined 'p-positive (semi-)definiteness', that leads to an interesting
interpolation between positive semi-definite matrices (p=2) and diagonally
dominant matrices with non-negative diagonal entries (p=1). Under this
condition, we then provide a convergence rate for the distance to the global
optimum in Lp. Furthermore, we provide a generalized central limit theorem,
which shows that the properly scaled Polyak-Ruppert averaging converges weakly
to a multivariate α-stable random vector. Our results indicate that even
under heavy-tailed noise with infinite variance, SGD can converge to the global
optimum without necessitating any modification neither to the loss function or
to the algorithm itself, as typically required in robust statistics. We
demonstrate the implications of our results to applications such as linear
regression and generalized linear models subject to heavy-tailed data