Search CORE

5,196 research outputs found

Recommended from our members

Stochastic gradients methods for statistical inference

Author: Li Tianyang, Ph. D.
Publication venue
Publication date: 10/07/2019
Field of study

Statistical inference, such as hypothesis testing and calculating a confidence interval, is an important tool for accessing uncertainty in machine learning and statistical problems. Stochastic gradient methods, such as stochastic gradient descent (SGD), have recently been successfully applied to point estimation in large scale machine learning problems. In this work, we present novel stochastic gradient methods for statistical inference in large scale machine learning problems. Unregularized M -estimation using SGD. Using SGD with a fixed step size, we demonstrate that the average of such SGD sequences can be used for statistical inference, after proper scaling. An intuitive analysis using the Ornstein-Uhlenbeck process suggests that such averages are asymptotically normal. From a practical perspective, our SGD-based inference procedure is a first order method, and is well-suited for large scale problems. To show its merits, we apply it to both synthetic and real datasets, and demonstrate that its accuracy is comparable to classical statistical methods, while requiring potentially far less computation. Approximate Newton-based statistical inference using only stochastic gradients for unregularized M -estimation. We present a novel inference framework for convex empirical risk minimization, using approximate stochastic Newton steps. The proposed algorithm is based on the notion of finite differences and allows the approximation of a Hessian-vector product from first-order information. In theory, our method efficiently computes the statistical error covariance in M -estimation for unregularized convex learning problems, without using exact second order information, or resampling the entire data set. In practice, we demonstrate the effectiveness of our framework on large-scale machine learning problems, that go even beyond convexity: as a highlight, our work can be used to detect certain adversarial attacks on neural networks. High dimensional linear regression statistical inference using only stochastic gra- dients. As an extension of the approximate Newton-based statistical inference algorithm for unregularized problems, we present a similar algorithm, using only stochastic gradients, for statistical inference in high dimensional linear regression, where the number of features is much larger than the number of samples. Stochastic gradient methods for time series analysis. We present a novel stochastic gradient descent algorithm for time series analysis, which correctly captures correlation structures in a time series dataset during optimization. Instead of uniformly sampling indices in vanilla SGD, we uniformly sample contiguous blocks of indices, where the block length depends on the dataset.Computer Science

Texas ScholarWorks

Limitations of the Empirical Fisher Approximation for Natural Gradient Descent

Author: Balles Lukas
Hennig Philipp
Kunstner Frederik
Publication venue
Publication date: 01/06/2020
Field of study

Natural gradient descent, which preconditions a gradient descent update with the Fisher information matrix of the underlying statistical model, is a way to capture partial second-order information. Several highly visible works have advocated an approximation known as the empirical Fisher, drawing connections between approximate second-order methods and heuristics like Adam. We dispute this argument by showing that the empirical Fisher---unlike the Fisher---does not generally capture second-order information. We further argue that the conditions under which the empirical Fisher approaches the Fisher (and the Hessian) are unlikely to be met in practice, and that, even on simple optimization problems, the pathologies of the empirical Fisher can have undesirable effects.Comment: V3: Minor corrections (typographic errors

arXiv.org e-Print Archive

MPG.PuRe

Quasi-Newton particle Metropolis-Hastings

Author: Dahlin Johan
Lindsten Fredrik
Schön Thomas B.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

Particle Metropolis-Hastings enables Bayesian parameter inference in general nonlinear state space models (SSMs). However, in many implementations a random walk proposal is used and this can result in poor mixing if not tuned correctly using tedious pilot runs. Therefore, we consider a new proposal inspired by quasi-Newton algorithms that may achieve similar (or better) mixing with less tuning. An advantage compared to other Hessian based proposals, is that it only requires estimates of the gradient of the log-posterior. A possible application is parameter inference in the challenging class of SSMs with intractable likelihoods. We exemplify this application and the benefits of the new proposal by modelling log-returns of future contracts on coffee by a stochastic volatility model with

\alpha

-stable observations.Comment: 23 pages, 5 figures. Accepted for the 17th IFAC Symposium on System Identification (SYSID), Beijing, China, October 201

arXiv.org e-Print Archive

CiteSeerX

Publikationer från Linköpings universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Newton-based maximum likelihood estimation in nonlinear state space models

Author: Dahlin Johan
Kok Manon
Schön Thomas B.
Wills Adrian
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

Maximum likelihood (ML) estimation using Newton's method in nonlinear state space models (SSMs) is a challenging problem due to the analytical intractability of the log-likelihood and its gradient and Hessian. We estimate the gradient and Hessian using Fisher's identity in combination with a smoothing algorithm. We explore two approximations of the log-likelihood and of the solution of the smoothing problem. The first is a linearization approximation which is computationally cheap, but the accuracy typically varies between models. The second is a sampling approximation which is asymptotically valid for any SSM but is more computationally costly. We demonstrate our approach for ML parameter estimation on simulated data from two different SSMs with encouraging results.Comment: 17 pages, 2 figures. Accepted for the 17th IFAC Symposium on System Identification (SYSID), Beijing, China, October 201

arXiv.org e-Print Archive

CiteSeerX

Publikationer från Linköpings universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line