5,196 research outputs found
Recommended from our members
Stochastic gradients methods for statistical inference
Statistical inference, such as hypothesis testing and calculating a confidence interval, is an important tool for accessing uncertainty in machine learning and statistical problems. Stochastic gradient methods, such as stochastic gradient descent (SGD), have recently been successfully applied to point estimation in large scale machine learning problems. In this work, we present novel stochastic gradient methods for statistical inference in large scale machine learning problems. Unregularized M -estimation using SGD. Using SGD with a fixed step size, we demonstrate that the average of such SGD sequences can be used for statistical inference, after proper scaling. An intuitive analysis using the Ornstein-Uhlenbeck process suggests that such averages are asymptotically normal. From a practical perspective, our SGD-based inference procedure is a first order method, and is well-suited for large scale problems. To show its merits, we apply it to both synthetic and real datasets, and demonstrate that its accuracy is comparable to classical statistical methods, while requiring potentially far less computation. Approximate Newton-based statistical inference using only stochastic gradients for unregularized M -estimation. We present a novel inference framework for convex empirical risk minimization, using approximate stochastic Newton steps. The proposed algorithm is based on the notion of finite differences and allows the approximation of a Hessian-vector product from first-order information. In theory, our method efficiently computes the statistical error covariance in M -estimation for unregularized convex learning problems, without using exact second order information, or resampling the entire data set. In practice, we demonstrate the effectiveness of our framework on large-scale machine learning problems, that go even beyond convexity: as a highlight, our work can be used to detect certain adversarial attacks on neural networks. High dimensional linear regression statistical inference using only stochastic gra- dients. As an extension of the approximate Newton-based statistical inference algorithm for unregularized problems, we present a similar algorithm, using only stochastic gradients, for statistical inference in high dimensional linear regression, where the number of features is much larger than the number of samples. Stochastic gradient methods for time series analysis. We present a novel stochastic gradient descent algorithm for time series analysis, which correctly captures correlation structures in a time series dataset during optimization. Instead of uniformly sampling indices in vanilla SGD, we uniformly sample contiguous blocks of indices, where the block length depends on the dataset.Computer Science
Limitations of the Empirical Fisher Approximation for Natural Gradient Descent
Natural gradient descent, which preconditions a gradient descent update with
the Fisher information matrix of the underlying statistical model, is a way to
capture partial second-order information. Several highly visible works have
advocated an approximation known as the empirical Fisher, drawing connections
between approximate second-order methods and heuristics like Adam. We dispute
this argument by showing that the empirical Fisher---unlike the Fisher---does
not generally capture second-order information. We further argue that the
conditions under which the empirical Fisher approaches the Fisher (and the
Hessian) are unlikely to be met in practice, and that, even on simple
optimization problems, the pathologies of the empirical Fisher can have
undesirable effects.Comment: V3: Minor corrections (typographic errors
Quasi-Newton particle Metropolis-Hastings
Particle Metropolis-Hastings enables Bayesian parameter inference in general
nonlinear state space models (SSMs). However, in many implementations a random
walk proposal is used and this can result in poor mixing if not tuned correctly
using tedious pilot runs. Therefore, we consider a new proposal inspired by
quasi-Newton algorithms that may achieve similar (or better) mixing with less
tuning. An advantage compared to other Hessian based proposals, is that it only
requires estimates of the gradient of the log-posterior. A possible application
is parameter inference in the challenging class of SSMs with intractable
likelihoods. We exemplify this application and the benefits of the new proposal
by modelling log-returns of future contracts on coffee by a stochastic
volatility model with -stable observations.Comment: 23 pages, 5 figures. Accepted for the 17th IFAC Symposium on System
Identification (SYSID), Beijing, China, October 201
Newton-based maximum likelihood estimation in nonlinear state space models
Maximum likelihood (ML) estimation using Newton's method in nonlinear state
space models (SSMs) is a challenging problem due to the analytical
intractability of the log-likelihood and its gradient and Hessian. We estimate
the gradient and Hessian using Fisher's identity in combination with a
smoothing algorithm. We explore two approximations of the log-likelihood and of
the solution of the smoothing problem. The first is a linearization
approximation which is computationally cheap, but the accuracy typically varies
between models. The second is a sampling approximation which is asymptotically
valid for any SSM but is more computationally costly. We demonstrate our
approach for ML parameter estimation on simulated data from two different SSMs
with encouraging results.Comment: 17 pages, 2 figures. Accepted for the 17th IFAC Symposium on System
Identification (SYSID), Beijing, China, October 201
- …