11,443 research outputs found
Bayesian Optimization for Adaptive MCMC
This paper proposes a new randomized strategy for adaptive MCMC using
Bayesian optimization. This approach applies to non-differentiable objective
functions and trades off exploration and exploitation to reduce the number of
potentially costly objective function evaluations. We demonstrate the strategy
in the complex setting of sampling from constrained, discrete and densely
connected probabilistic graphical models where, for each variation of the
problem, one needs to adjust the parameters of the proposal mechanism
automatically to ensure efficient mixing of the Markov chains.Comment: This paper contains 12 pages and 6 figures. A similar version of this
paper has been submitted to AISTATS 2012 and is currently under revie
Patterns of Scalable Bayesian Inference
Datasets are growing not just in size but in complexity, creating a demand
for rich models and quantification of uncertainty. Bayesian methods are an
excellent fit for this demand, but scaling Bayesian inference is a challenge.
In response to this challenge, there has been considerable recent work based on
varying assumptions about model structure, underlying computational resources,
and the importance of asymptotic correctness. As a result, there is a zoo of
ideas with few clear overarching principles.
In this paper, we seek to identify unifying principles, patterns, and
intuitions for scaling Bayesian inference. We review existing work on utilizing
modern computing resources with both MCMC and variational approximation
techniques. From this taxonomy of ideas, we characterize the general principles
that have proven successful for designing scalable inference procedures and
comment on the path forward
Adaptive Multiple Importance Sampling for Gaussian Processes
In applications of Gaussian processes where quantification of uncertainty is
a strict requirement, it is necessary to accurately characterize the posterior
distribution over Gaussian process covariance parameters. Normally, this is
done by means of standard Markov chain Monte Carlo (MCMC) algorithms. Motivated
by the issues related to the complexity of calculating the marginal likelihood
that can make MCMC algorithms inefficient, this paper develops an alternative
inference framework based on Adaptive Multiple Importance Sampling (AMIS). This
paper studies the application of AMIS in the case of a Gaussian likelihood, and
proposes the Pseudo-Marginal AMIS for non-Gaussian likelihoods, where the
marginal likelihood is unbiasedly estimated. The results suggest that the
proposed framework outperforms MCMC-based inference of covariance parameters in
a wide range of scenarios and remains competitive for moderately large
dimensional parameter spaces.Comment: 27 page
Metropolis Sampling
Monte Carlo (MC) sampling methods are widely applied in Bayesian inference,
system simulation and optimization problems. The Markov Chain Monte Carlo
(MCMC) algorithms are a well-known class of MC methods which generate a Markov
chain with the desired invariant distribution. In this document, we focus on
the Metropolis-Hastings (MH) sampler, which can be considered as the atom of
the MCMC techniques, introducing the basic notions and different properties. We
describe in details all the elements involved in the MH algorithm and the most
relevant variants. Several improvements and recent extensions proposed in the
literature are also briefly discussed, providing a quick but exhaustive
overview of the current Metropolis-based sampling's world.Comment: Wiley StatsRef-Statistics Reference Online, 201
Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks
Effective training of deep neural networks suffers from two main issues. The
first is that the parameter spaces of these models exhibit pathological
curvature. Recent methods address this problem by using adaptive
preconditioning for Stochastic Gradient Descent (SGD). These methods improve
convergence by adapting to the local geometry of parameter space. A second
issue is overfitting, which is typically addressed by early stopping. However,
recent work has demonstrated that Bayesian model averaging mitigates this
problem. The posterior can be sampled by using Stochastic Gradient Langevin
Dynamics (SGLD). However, the rapidly changing curvature renders default SGLD
methods inefficient. Here, we propose combining adaptive preconditioners with
SGLD. In support of this idea, we give theoretical properties on asymptotic
convergence and predictive risk. We also provide empirical results for Logistic
Regression, Feedforward Neural Nets, and Convolutional Neural Nets,
demonstrating that our preconditioned SGLD method gives state-of-the-art
performance on these models.Comment: AAAI 201
- …