4,457 research outputs found
Practical bounds on the error of Bayesian posterior approximations: A nonasymptotic approach
Bayesian inference typically requires the computation of an approximation to
the posterior distribution. An important requirement for an approximate
Bayesian inference algorithm is to output high-accuracy posterior mean and
uncertainty estimates. Classical Monte Carlo methods, particularly Markov Chain
Monte Carlo, remain the gold standard for approximate Bayesian inference
because they have a robust finite-sample theory and reliable convergence
diagnostics. However, alternative methods, which are more scalable or apply to
problems where Markov Chain Monte Carlo cannot be used, lack the same
finite-data approximation theory and tools for evaluating their accuracy. In
this work, we develop a flexible new approach to bounding the error of mean and
uncertainty estimates of scalable inference algorithms. Our strategy is to
control the estimation errors in terms of Wasserstein distance, then bound the
Wasserstein distance via a generalized notion of Fisher distance. Unlike
computing the Wasserstein distance, which requires access to the normalized
posterior distribution, the Fisher distance is tractable to compute because it
requires access only to the gradient of the log posterior density. We
demonstrate the usefulness of our Fisher distance approach by deriving bounds
on the Wasserstein error of the Laplace approximation and Hilbert coresets. We
anticipate that our approach will be applicable to many other approximate
inference methods such as the integrated Laplace approximation, variational
inference, and approximate Bayesian computationComment: 22 pages, 2 figure
A Likelihood-Free Inference Framework for Population Genetic Data using Exchangeable Neural Networks
An explosion of high-throughput DNA sequencing in the past decade has led to
a surge of interest in population-scale inference with whole-genome data.
Recent work in population genetics has centered on designing inference methods
for relatively simple model classes, and few scalable general-purpose inference
techniques exist for more realistic, complex models. To achieve this, two
inferential challenges need to be addressed: (1) population data are
exchangeable, calling for methods that efficiently exploit the symmetries of
the data, and (2) computing likelihoods is intractable as it requires
integrating over a set of correlated, extremely high-dimensional latent
variables. These challenges are traditionally tackled by likelihood-free
methods that use scientific simulators to generate datasets and reduce them to
hand-designed, permutation-invariant summary statistics, often leading to
inaccurate inference. In this work, we develop an exchangeable neural network
that performs summary statistic-free, likelihood-free inference. Our framework
can be applied in a black-box fashion across a variety of simulation-based
tasks, both within and outside biology. We demonstrate the power of our
approach on the recombination hotspot testing problem, outperforming the
state-of-the-art.Comment: 9 pages, 8 figure
Scalable Bayesian nonparametric regression via a Plackett-Luce model for conditional ranks
We present a novel Bayesian nonparametric regression model for covariates X
and continuous, real response variable Y. The model is parametrized in terms of
marginal distributions for Y and X and a regression function which tunes the
stochastic ordering of the conditional distributions F(y|x). By adopting an
approximate composite likelihood approach, we show that the resulting posterior
inference can be decoupled for the separate components of the model. This
procedure can scale to very large datasets and allows for the use of standard,
existing, software from Bayesian nonparametric density estimation and
Plackett-Luce ranking estimation to be applied. As an illustration, we show an
application of our approach to a US Census dataset, with over 1,300,000 data
points and more than 100 covariates
- …