84 research outputs found

    Fast and Robust Online Inference with Stochastic Gradient Descent via Random Scaling

    Full text link
    We develop a new method of online inference for a vector of parameters estimated by the Polyak-Ruppert averaging procedure of stochastic gradient descent (SGD) algorithms. We leverage insights from time series regression in econometrics and construct asymptotically pivotal statistics via random scaling. Our approach is fully operational with online data and is rigorously underpinned by a functional central limit theorem. Our proposed inference method has a couple of key advantages over the existing methods. First, the test statistic is computed in an online fashion with only SGD iterates and the critical values can be obtained without any resampling methods, thereby allowing for efficient implementation suitable for massive online data. Second, there is no need to estimate the asymptotic variance and our inference method is shown to be robust to changes in the tuning parameters for SGD algorithms in simulation experiments with synthetic data.Comment: 16 pages, 5 figures, 5 table

    Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance

    Full text link
    Recent studies have provided both empirical and theoretical evidence illustrating that heavy tails can emerge in stochastic gradient descent (SGD) in various scenarios. Such heavy tails potentially result in iterates with diverging variance, which hinders the use of conventional convergence analysis techniques that rely on the existence of the second-order moments. In this paper, we provide convergence guarantees for SGD under a state-dependent and heavy-tailed noise with a potentially infinite variance, for a class of strongly convex objectives. In the case where the pp-th moment of the noise exists for some p[1,2)p\in [1,2), we first identify a condition on the Hessian, coined 'pp-positive (semi-)definiteness', that leads to an interesting interpolation between positive semi-definite matrices (p=2p=2) and diagonally dominant matrices with non-negative diagonal entries (p=1p=1). Under this condition, we then provide a convergence rate for the distance to the global optimum in LpL^p. Furthermore, we provide a generalized central limit theorem, which shows that the properly scaled Polyak-Ruppert averaging converges weakly to a multivariate α\alpha-stable random vector. Our results indicate that even under heavy-tailed noise with infinite variance, SGD can converge to the global optimum without necessitating any modification neither to the loss function or to the algorithm itself, as typically required in robust statistics. We demonstrate the implications of our results to applications such as linear regression and generalized linear models subject to heavy-tailed data

    Weighted Averaged Stochastic Gradient Descent: Asymptotic Normality and Optimality

    Full text link
    Stochastic Gradient Descent (SGD) is one of the simplest and most popular algorithms in modern statistical and machine learning due to its computational and memory efficiency. Various averaging schemes have been proposed to accelerate the convergence of SGD in different settings. In this paper, we explore a general averaging scheme for SGD. Specifically, we establish the asymptotic normality of a broad range of weighted averaged SGD solutions and provide asymptotically valid online inference approaches. Furthermore, we propose an adaptive averaging scheme that exhibits both optimal statistical rate and favorable non-asymptotic convergence, drawing insights from the optimal weight for the linear model in terms of non-asymptotic mean squared error (MSE)

    Fast, asymptotically efficient, recursive estimation in a Riemannian manifold

    Full text link
    Stochastic optimisation in Riemannian manifolds, especially the Riemannian stochastic gradient method, has attracted much recent attention. The present work applies stochastic optimisation to the task of recursive estimation of a statistical parameter which belongs to a Riemannian manifold. Roughly, this task amounts to stochastic minimisation of a statistical divergence function. The following problem is considered : how to obtain fast, asymptotically efficient, recursive estimates, using a Riemannian stochastic optimisation algorithm with decreasing step sizes? In solving this problem, several original results are introduced. First, without any convexity assumptions on the divergence function, it is proved that, with an adequate choice of step sizes, the algorithm computes recursive estimates which achieve a fast non-asymptotic rate of convergence. Second, the asymptotic normality of these recursive estimates is proved, by employing a novel linearisation technique. Third, it is proved that, when the Fisher information metric is used to guide the algorithm, these recursive estimates achieve an optimal asymptotic rate of convergence, in the sense that they become asymptotically efficient. These results, while relatively familiar in the Euclidean context, are here formulated and proved for the first time, in the Riemannian context. In addition, they are illustrated with a numerical application to the recursive estimation of elliptically contoured distributions.Comment: updated version of draft submitted for publication, currently under revie

    Statistical Learning Theory for Control: A Finite Sample Perspective

    Full text link
    This tutorial survey provides an overview of recent non-asymptotic advances in statistical learning theory as relevant to control and system identification. While there has been substantial progress across all areas of control, the theory is most well-developed when it comes to linear system identification and learning for the linear quadratic regulator, which are the focus of this manuscript. From a theoretical perspective, much of the labor underlying these advances has been in adapting tools from modern high-dimensional statistics and learning theory. While highly relevant to control theorists interested in integrating tools from machine learning, the foundational material has not always been easily accessible. To remedy this, we provide a self-contained presentation of the relevant material, outlining all the key ideas and the technical machinery that underpin recent results. We also present a number of open problems and future directions.Comment: Survey Paper, Submitted to Control Systems Magazine. Second version contains additional motivation for finite sample statistics and more detailed comparison with classical literatur

    Causal Reinforcement Learning: An Instrumental Variable Approach

    Full text link
    In the standard data analysis framework, data is first collected (once for all), and then data analysis is carried out. With the advancement of digital technology, decisionmakers constantly analyze past data and generate new data through the decisions they make. In this paper, we model this as a Markov decision process and show that the dynamic interaction between data generation and data analysis leads to a new type of bias -- reinforcement bias -- that exacerbates the endogeneity problem in standard data analysis. We propose a class of instrument variable (IV)-based reinforcement learning (RL) algorithms to correct for the bias and establish their asymptotic properties by incorporating them into a two-timescale stochastic approximation framework. A key contribution of the paper is the development of new techniques that allow for the analysis of the algorithms in general settings where noises feature time-dependency. We use the techniques to derive sharper results on finite-time trajectory stability bounds: with a polynomial rate, the entire future trajectory of the iterates from the algorithm fall within a ball that is centered at the true parameter and is shrinking at a (different) polynomial rate. We also use the technique to provide formulas for inferences that are rarely done for RL algorithms. These formulas highlight how the strength of the IV and the degree of the noise's time dependency affect the inference.Comment: main body: 38 pages; supplemental material: 58 page
    corecore