11,443 research outputs found

    Bayesian Optimization for Adaptive MCMC

    Full text link
    This paper proposes a new randomized strategy for adaptive MCMC using Bayesian optimization. This approach applies to non-differentiable objective functions and trades off exploration and exploitation to reduce the number of potentially costly objective function evaluations. We demonstrate the strategy in the complex setting of sampling from constrained, discrete and densely connected probabilistic graphical models where, for each variation of the problem, one needs to adjust the parameters of the proposal mechanism automatically to ensure efficient mixing of the Markov chains.Comment: This paper contains 12 pages and 6 figures. A similar version of this paper has been submitted to AISTATS 2012 and is currently under revie

    Patterns of Scalable Bayesian Inference

    Full text link
    Datasets are growing not just in size but in complexity, creating a demand for rich models and quantification of uncertainty. Bayesian methods are an excellent fit for this demand, but scaling Bayesian inference is a challenge. In response to this challenge, there has been considerable recent work based on varying assumptions about model structure, underlying computational resources, and the importance of asymptotic correctness. As a result, there is a zoo of ideas with few clear overarching principles. In this paper, we seek to identify unifying principles, patterns, and intuitions for scaling Bayesian inference. We review existing work on utilizing modern computing resources with both MCMC and variational approximation techniques. From this taxonomy of ideas, we characterize the general principles that have proven successful for designing scalable inference procedures and comment on the path forward

    Adaptive Multiple Importance Sampling for Gaussian Processes

    Get PDF
    In applications of Gaussian processes where quantification of uncertainty is a strict requirement, it is necessary to accurately characterize the posterior distribution over Gaussian process covariance parameters. Normally, this is done by means of standard Markov chain Monte Carlo (MCMC) algorithms. Motivated by the issues related to the complexity of calculating the marginal likelihood that can make MCMC algorithms inefficient, this paper develops an alternative inference framework based on Adaptive Multiple Importance Sampling (AMIS). This paper studies the application of AMIS in the case of a Gaussian likelihood, and proposes the Pseudo-Marginal AMIS for non-Gaussian likelihoods, where the marginal likelihood is unbiasedly estimated. The results suggest that the proposed framework outperforms MCMC-based inference of covariance parameters in a wide range of scenarios and remains competitive for moderately large dimensional parameter spaces.Comment: 27 page

    Metropolis Sampling

    Full text link
    Monte Carlo (MC) sampling methods are widely applied in Bayesian inference, system simulation and optimization problems. The Markov Chain Monte Carlo (MCMC) algorithms are a well-known class of MC methods which generate a Markov chain with the desired invariant distribution. In this document, we focus on the Metropolis-Hastings (MH) sampler, which can be considered as the atom of the MCMC techniques, introducing the basic notions and different properties. We describe in details all the elements involved in the MH algorithm and the most relevant variants. Several improvements and recent extensions proposed in the literature are also briefly discussed, providing a quick but exhaustive overview of the current Metropolis-based sampling's world.Comment: Wiley StatsRef-Statistics Reference Online, 201

    Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks

    Full text link
    Effective training of deep neural networks suffers from two main issues. The first is that the parameter spaces of these models exhibit pathological curvature. Recent methods address this problem by using adaptive preconditioning for Stochastic Gradient Descent (SGD). These methods improve convergence by adapting to the local geometry of parameter space. A second issue is overfitting, which is typically addressed by early stopping. However, recent work has demonstrated that Bayesian model averaging mitigates this problem. The posterior can be sampled by using Stochastic Gradient Langevin Dynamics (SGLD). However, the rapidly changing curvature renders default SGLD methods inefficient. Here, we propose combining adaptive preconditioners with SGLD. In support of this idea, we give theoretical properties on asymptotic convergence and predictive risk. We also provide empirical results for Logistic Regression, Feedforward Neural Nets, and Convolutional Neural Nets, demonstrating that our preconditioned SGLD method gives state-of-the-art performance on these models.Comment: AAAI 201
    • …
    corecore