9,442 research outputs found
Orthogonal parallel MCMC methods for sampling and optimization
Monte Carlo (MC) methods are widely used for Bayesian inference and
optimization in statistics, signal processing and machine learning. A
well-known class of MC methods are Markov Chain Monte Carlo (MCMC) algorithms.
In order to foster better exploration of the state space, specially in
high-dimensional applications, several schemes employing multiple parallel MCMC
chains have been recently introduced. In this work, we describe a novel
parallel interacting MCMC scheme, called {\it orthogonal MCMC} (O-MCMC), where
a set of "vertical" parallel MCMC chains share information using some
"horizontal" MCMC techniques working on the entire population of current
states. More specifically, the vertical chains are led by random-walk
proposals, whereas the horizontal MCMC techniques employ independent proposals,
thus allowing an efficient combination of global exploration and local
approximation. The interaction is contained in these horizontal iterations.
Within the analysis of different implementations of O-MCMC, novel schemes in
order to reduce the overall computational cost of parallel multiple try
Metropolis (MTM) chains are also presented. Furthermore, a modified version of
O-MCMC for optimization is provided by considering parallel simulated annealing
(SA) algorithms. Numerical results show the advantages of the proposed sampling
scheme in terms of efficiency in the estimation, as well as robustness in terms
of independence with respect to initial values and the choice of the
parameters
Patterns of Scalable Bayesian Inference
Datasets are growing not just in size but in complexity, creating a demand
for rich models and quantification of uncertainty. Bayesian methods are an
excellent fit for this demand, but scaling Bayesian inference is a challenge.
In response to this challenge, there has been considerable recent work based on
varying assumptions about model structure, underlying computational resources,
and the importance of asymptotic correctness. As a result, there is a zoo of
ideas with few clear overarching principles.
In this paper, we seek to identify unifying principles, patterns, and
intuitions for scaling Bayesian inference. We review existing work on utilizing
modern computing resources with both MCMC and variational approximation
techniques. From this taxonomy of ideas, we characterize the general principles
that have proven successful for designing scalable inference procedures and
comment on the path forward
A Shuffled Complex Evolution Metropolis algorithm for optimization and uncertainty assessment of hydrologic model parameters
Markov Chain Monte Carlo (MCMC) methods have become increasingly popular for estimating the posterior probability distribution of parameters in hydrologic models. However, MCMC methods require the a priori definition of a proposal or sampling distribution, which determines the explorative capabilities and efficiency of the sampler and therefore the statistical properties of the Markov Chain and its rate of convergence. In this paper we present an MCMC sampler entitled the Shuffled Complex Evolution Metropolis algorithm (SCEM-UA), which is well suited to infer the posterior distribution of hydrologic model parameters. The SCEM-UA algorithm is a modified version of the original SCE-UA global optimization algorithm developed by Duan et al. [1992]. The SCEM-UA algorithm operates by merging the strengths of the Metropolis algorithm, controlled random search, competitive evolution, and complex shuffling in order to continuously update the proposal distribution and evolve the sampler to the posterior target distribution. Three case studies demonstrate that the adaptive capability of the SCEM-UA algorithm significantly reduces the number of model simulations needed to infer the posterior distribution of the parameters when compared with the traditional Metropolis-Hastings samplers
Practical Bayesian Optimization of Machine Learning Algorithms
Machine learning algorithms frequently require careful tuning of model
hyperparameters, regularization terms, and optimization parameters.
Unfortunately, this tuning is often a "black art" that requires expert
experience, unwritten rules of thumb, or sometimes brute-force search. Much
more appealing is the idea of developing automatic approaches which can
optimize the performance of a given learning algorithm to the task at hand. In
this work, we consider the automatic tuning problem within the framework of
Bayesian optimization, in which a learning algorithm's generalization
performance is modeled as a sample from a Gaussian process (GP). The tractable
posterior distribution induced by the GP leads to efficient use of the
information gathered by previous experiments, enabling optimal choices about
what parameters to try next. Here we show how the effects of the Gaussian
process prior and the associated inference procedure can have a large impact on
the success or failure of Bayesian optimization. We show that thoughtful
choices can lead to results that exceed expert-level performance in tuning
machine learning algorithms. We also describe new algorithms that take into
account the variable cost (duration) of learning experiments and that can
leverage the presence of multiple cores for parallel experimentation. We show
that these proposed algorithms improve on previous automatic procedures and can
reach or surpass human expert-level optimization on a diverse set of
contemporary algorithms including latent Dirichlet allocation, structured SVMs
and convolutional neural networks
Generalized Direct Sampling for Hierarchical Bayesian Models
We develop a new method to sample from posterior distributions in
hierarchical models without using Markov chain Monte Carlo. This method, which
is a variant of importance sampling ideas, is generally applicable to
high-dimensional models involving large data sets. Samples are independent, so
they can be collected in parallel, and we do not need to be concerned with
issues like chain convergence and autocorrelation. Additionally, the method can
be used to compute marginal likelihoods
- …