7,877 research outputs found
Orthogonal MCMC algorithms
Monte Carlo (MC) methods are widely used in signal processing, machine learning and stochastic optimization. A well-known class of MC methods are Markov Chain Monte Carlo (MCMC) algorithms. In this work, we introduce a novel parallel interacting MCMC scheme, where the parallel chains share information using another MCMC technique working on the entire population of current states. These parallel ?vertical? chains are led by random-walk proposals, whereas the ?horizontal? MCMC uses a independent proposal, which can be easily adapted by making use of all the generated samples. Numerical results show the advantages of the proposed sampling scheme in terms of mean absolute error, as well as robustness w.r.t. to initial values and parameter choice
Orthogonal parallel MCMC methods for sampling and optimization
Monte Carlo (MC) methods are widely used for Bayesian inference and
optimization in statistics, signal processing and machine learning. A
well-known class of MC methods are Markov Chain Monte Carlo (MCMC) algorithms.
In order to foster better exploration of the state space, specially in
high-dimensional applications, several schemes employing multiple parallel MCMC
chains have been recently introduced. In this work, we describe a novel
parallel interacting MCMC scheme, called {\it orthogonal MCMC} (O-MCMC), where
a set of "vertical" parallel MCMC chains share information using some
"horizontal" MCMC techniques working on the entire population of current
states. More specifically, the vertical chains are led by random-walk
proposals, whereas the horizontal MCMC techniques employ independent proposals,
thus allowing an efficient combination of global exploration and local
approximation. The interaction is contained in these horizontal iterations.
Within the analysis of different implementations of O-MCMC, novel schemes in
order to reduce the overall computational cost of parallel multiple try
Metropolis (MTM) chains are also presented. Furthermore, a modified version of
O-MCMC for optimization is provided by considering parallel simulated annealing
(SA) algorithms. Numerical results show the advantages of the proposed sampling
scheme in terms of efficiency in the estimation, as well as robustness in terms
of independence with respect to initial values and the choice of the
parameters
Large-Scale Distributed Bayesian Matrix Factorization using Stochastic Gradient MCMC
Despite having various attractive qualities such as high prediction accuracy
and the ability to quantify uncertainty and avoid over-fitting, Bayesian Matrix
Factorization has not been widely adopted because of the prohibitive cost of
inference. In this paper, we propose a scalable distributed Bayesian matrix
factorization algorithm using stochastic gradient MCMC. Our algorithm, based on
Distributed Stochastic Gradient Langevin Dynamics, can not only match the
prediction accuracy of standard MCMC methods like Gibbs sampling, but at the
same time is as fast and simple as stochastic gradient descent. In our
experiments, we show that our algorithm can achieve the same level of
prediction accuracy as Gibbs sampling an order of magnitude faster. We also
show that our method reduces the prediction error as fast as distributed
stochastic gradient descent, achieving a 4.1% improvement in RMSE for the
Netflix dataset and an 1.8% for the Yahoo music dataset
Bayesian orthogonal component analysis for sparse representation
This paper addresses the problem of identifying a lower dimensional space
where observed data can be sparsely represented. This under-complete dictionary
learning task can be formulated as a blind separation problem of sparse sources
linearly mixed with an unknown orthogonal mixing matrix. This issue is
formulated in a Bayesian framework. First, the unknown sparse sources are
modeled as Bernoulli-Gaussian processes. To promote sparsity, a weighted
mixture of an atom at zero and a Gaussian distribution is proposed as prior
distribution for the unobserved sources. A non-informative prior distribution
defined on an appropriate Stiefel manifold is elected for the mixing matrix.
The Bayesian inference on the unknown parameters is conducted using a Markov
chain Monte Carlo (MCMC) method. A partially collapsed Gibbs sampler is
designed to generate samples asymptotically distributed according to the joint
posterior distribution of the unknown model parameters and hyperparameters.
These samples are then used to approximate the joint maximum a posteriori
estimator of the sources and mixing matrix. Simulations conducted on synthetic
data are reported to illustrate the performance of the method for recovering
sparse representations. An application to sparse coding on under-complete
dictionary is finally investigated.Comment: Revised version. Accepted to IEEE Trans. Signal Processin
Data-Driven Model Reduction for the Bayesian Solution of Inverse Problems
One of the major challenges in the Bayesian solution of inverse problems
governed by partial differential equations (PDEs) is the computational cost of
repeatedly evaluating numerical PDE models, as required by Markov chain Monte
Carlo (MCMC) methods for posterior sampling. This paper proposes a data-driven
projection-based model reduction technique to reduce this computational cost.
The proposed technique has two distinctive features. First, the model reduction
strategy is tailored to inverse problems: the snapshots used to construct the
reduced-order model are computed adaptively from the posterior distribution.
Posterior exploration and model reduction are thus pursued simultaneously.
Second, to avoid repeated evaluations of the full-scale numerical model as in a
standard MCMC method, we couple the full-scale model and the reduced-order
model together in the MCMC algorithm. This maintains accurate inference while
reducing its overall computational cost. In numerical experiments considering
steady-state flow in a porous medium, the data-driven reduced-order model
achieves better accuracy than a reduced-order model constructed using the
classical approach. It also improves posterior sampling efficiency by several
orders of magnitude compared to a standard MCMC method
Distributed Bayesian Matrix Factorization with Limited Communication
Bayesian matrix factorization (BMF) is a powerful tool for producing low-rank
representations of matrices and for predicting missing values and providing
confidence intervals. Scaling up the posterior inference for massive-scale
matrices is challenging and requires distributing both data and computation
over many workers, making communication the main computational bottleneck.
Embarrassingly parallel inference would remove the communication needed, by
using completely independent computations on different data subsets, but it
suffers from the inherent unidentifiability of BMF solutions. We introduce a
hierarchical decomposition of the joint posterior distribution, which couples
the subset inferences, allowing for embarrassingly parallel computations in a
sequence of at most three stages. Using an efficient approximate
implementation, we show improvements empirically on both real and simulated
data. Our distributed approach is able to achieve a speed-up of almost an order
of magnitude over the full posterior, with a negligible effect on predictive
accuracy. Our method outperforms state-of-the-art embarrassingly parallel MCMC
methods in accuracy, and achieves results competitive to other available
distributed and parallel implementations of BMF.Comment: 28 pages, 8 figures. The paper is published in Machine Learning
journal. An implementation of the method is is available in SMURFF software
on github (bmfpp branch): https://github.com/ExaScience/smurf
Robust adaptive Metropolis algorithm with coerced acceptance rate
The adaptive Metropolis (AM) algorithm of Haario, Saksman and Tamminen
[Bernoulli 7 (2001) 223-242] uses the estimated covariance of the target
distribution in the proposal distribution. This paper introduces a new robust
adaptive Metropolis algorithm estimating the shape of the target distribution
and simultaneously coercing the acceptance rate. The adaptation rule is
computationally simple adding no extra cost compared with the AM algorithm. The
adaptation strategy can be seen as a multidimensional extension of the
previously proposed method adapting the scale of the proposal distribution in
order to attain a given acceptance rate. The empirical results show promising
behaviour of the new algorithm in an example with Student target distribution
having no finite second moment, where the AM covariance estimate is unstable.
In the examples with finite second moments, the performance of the new approach
seems to be competitive with the AM algorithm combined with scale adaptation.Comment: 21 pages, 3 figure
- âŠ