11,339 research outputs found
Stochastic Gradient MCMC for Nonlinear State Space Models
State space models (SSMs) provide a flexible framework for modeling complex time series via a latent stochastic process. Inference for nonlinear, non-Gaussian SSMs is often tackled with particle methods that do not scale well to long time series. The challenge is two-fold: not only do computations scale linearly with time, as in the linear case, but particle filters additionally suffer from increasing particle degeneracy with longer series. Stochastic gradient MCMC methods have been developed to scale inference for hidden Markov models (HMMs) and linear SSMs using buffered stochastic gradient estimates to account for temporal dependencies. We extend these stochastic gradient estimators to nonlinear SSMs using particle methods. We present error bounds that account for both buffering error and particle error in the case of nonlinear SSMs that are log-concave in the latent process. We evaluate our proposed particle buffered stochastic gradient using SGMCMC for inference on both long sequential synthetic and minute-resolution financial returns data, demonstrating the importance of this class of methods
Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks
Effective training of deep neural networks suffers from two main issues. The
first is that the parameter spaces of these models exhibit pathological
curvature. Recent methods address this problem by using adaptive
preconditioning for Stochastic Gradient Descent (SGD). These methods improve
convergence by adapting to the local geometry of parameter space. A second
issue is overfitting, which is typically addressed by early stopping. However,
recent work has demonstrated that Bayesian model averaging mitigates this
problem. The posterior can be sampled by using Stochastic Gradient Langevin
Dynamics (SGLD). However, the rapidly changing curvature renders default SGLD
methods inefficient. Here, we propose combining adaptive preconditioners with
SGLD. In support of this idea, we give theoretical properties on asymptotic
convergence and predictive risk. We also provide empirical results for Logistic
Regression, Feedforward Neural Nets, and Convolutional Neural Nets,
demonstrating that our preconditioned SGLD method gives state-of-the-art
performance on these models.Comment: AAAI 201
A computational framework for infinite-dimensional Bayesian inverse problems: Part II. Stochastic Newton MCMC with application to ice sheet flow inverse problems
We address the numerical solution of infinite-dimensional inverse problems in
the framework of Bayesian inference. In the Part I companion to this paper
(arXiv.org:1308.1313), we considered the linearized infinite-dimensional
inverse problem. Here in Part II, we relax the linearization assumption and
consider the fully nonlinear infinite-dimensional inverse problem using a
Markov chain Monte Carlo (MCMC) sampling method. To address the challenges of
sampling high-dimensional pdfs arising from Bayesian inverse problems governed
by PDEs, we build on the stochastic Newton MCMC method. This method exploits
problem structure by taking as a proposal density a local Gaussian
approximation of the posterior pdf, whose construction is made tractable by
invoking a low-rank approximation of its data misfit component of the Hessian.
Here we introduce an approximation of the stochastic Newton proposal in which
we compute the low-rank-based Hessian at just the MAP point, and then reuse
this Hessian at each MCMC step. We compare the performance of the proposed
method to the original stochastic Newton MCMC method and to an independence
sampler. The comparison of the three methods is conducted on a synthetic ice
sheet inverse problem. For this problem, the stochastic Newton MCMC method with
a MAP-based Hessian converges at least as rapidly as the original stochastic
Newton MCMC method, but is far cheaper since it avoids recomputing the Hessian
at each step. On the other hand, it is more expensive per sample than the
independence sampler; however, its convergence is significantly more rapid, and
thus overall it is much cheaper. Finally, we present extensive analysis and
interpretation of the posterior distribution, and classify directions in
parameter space based on the extent to which they are informed by the prior or
the observations.Comment: 31 page
- …