42 research outputs found
Some models are useful, but how do we know which ones? Towards a unified Bayesian model taxonomy
Probabilistic (Bayesian) modeling has experienced a surge of applications in
almost all quantitative sciences and industrial areas. This development is
driven by a combination of several factors, including better probabilistic
estimation algorithms, flexible software, increased computing power, and a
growing awareness of the benefits of probabilistic learning. However, a
principled Bayesian model building workflow is far from complete and many
challenges remain. To aid future research and applications of a principled
Bayesian workflow, we ask and provide answers for what we perceive as two
fundamental questions of Bayesian modeling, namely (a) "What actually is a
Bayesian model?" and (b) "What makes a good Bayesian model?". As an answer to
the first question, we propose the PAD model taxonomy that defines four basic
kinds of Bayesian models, each representing some combination of the assumed
joint distribution of all (known or unknown) variables (P), a posterior
approximator (A), and training data (D). As an answer to the second question,
we propose ten utility dimensions according to which we can evaluate Bayesian
models holistically, namely, (1) causal consistency, (2) parameter
recoverability, (3) predictive performance, (4) fairness, (5) structural
faithfulness, (6) parsimony, (7) interpretability, (8) convergence, (9)
estimation speed, and (10) robustness. Further, we propose two example utility
decision trees that describe hierarchies and trade-offs between utilities
depending on the inferential goals that drive model building and testing
Group equivariant neural posterior estimation
Simulation-based inference with conditional neural density estimators is a powerful approach to solving inverse problems in science. However, these methods typically treat the underlying forward model as a black box, with no way to exploit geometric properties such as equivariances. Equivariances are common in scientific models, however integrating them directly into expressive inference networks (such as normalizing flows) is not straightforward. We here describe an alternative method to incorporate equivariances under joint transformations of parameters and data. Our method -- called group equivariant neural posterior estimation (GNPE) -- is based on self-consistently standardizing the "pose" of the data while estimating the posterior over parameters. It is architecture-independent, and applies both to exact and approximate equivariances. As a real-world application, we use GNPE for amortized inference of astrophysical binary black hole systems from gravitational-wave observations. We show that GNPE achieves state-of-the-art accuracy while reducing inference times by three orders of magnitude
Group equivariant neural posterior estimation
Simulation-based inference with conditional neural density estimators is a
powerful approach to solving inverse problems in science. However, these
methods typically treat the underlying forward model as a black box, with no
way to exploit geometric properties such as equivariances. Equivariances are
common in scientific models, however integrating them directly into expressive
inference networks (such as normalizing flows) is not straightforward. We here
describe an alternative method to incorporate equivariances under joint
transformations of parameters and data. Our method -- called group equivariant
neural posterior estimation (GNPE) -- is based on self-consistently
standardizing the "pose" of the data while estimating the posterior over
parameters. It is architecture-independent, and applies both to exact and
approximate equivariances. As a real-world application, we use GNPE for
amortized inference of astrophysical binary black hole systems from
gravitational-wave observations. We show that GNPE achieves state-of-the-art
accuracy while reducing inference times by three orders of magnitude.Comment: 13+11 pages, 5+8 figure
Simulation-based Inference : From Approximate Bayesian Computation and Particle Methods to Neural Density Estimation
This doctoral thesis in computational statistics utilizes both Monte Carlo methods(approximate Bayesian computation and sequential Monte Carlo) and machine-learning methods (deep learning and normalizing flows) to develop novel algorithms for inference in implicit Bayesian models. Implicit models are those for which calculating the likelihood function is very challenging (and often impossible), but model simulation is feasible. The inference methods developed in the thesis are simulation-based inference methods since they leverage the possibility to simulate data from the implicit models. Several approaches are considered in the thesis: Paper II and IV focus on classical methods (sequential Monte Carlo-based methods), while paper I and III focus on more recent machine learning methods (deep learning and normalizing flows, respectively).Paper I constructs novel deep learning methods for learning summary statistics for approximate Bayesian computation (ABC). To achieve this paper I introduces the partially exchangeable network (PEN), a deep learning architecture specifically designed for Markovian data (i.e., partially exchangeable data).Paper II considers Bayesian inference in stochastic differential equation mixed-effects models (SDEMEM). Bayesian inference for SDEMEMs is challenging due to the intractable likelihood function of SDEMEMs. Paper II addresses this problem by designing a novel a Gibbs-blocking strategy in combination with correlated pseudo marginal methods. The paper also discusses how custom particle filters can be adapted to the inference procedure.Paper III introduces the novel inference method sequential neural posterior and likelihood approximation (SNPLA). SNPLA is a simulation-based inference algorithm that utilizes normalizing flows for learning both the posterior distribution and the likelihood function of an implicit model via a sequential scheme. By learning both the likelihood and the posterior, and by leveraging the reverse Kullback Leibler (KL) divergence, SNPLA avoids ad-hoc correction steps and Markov chain Monte Carlo (MCMC) sampling.Paper IV introduces the accelerated-delayed acceptance (ADA) algorithm. ADA can be viewed as an extension of the delayed-acceptance (DA) MCMC algorithm that leverages connections between the two likelihood ratios of DA to further accelerate MCMC sampling from the posterior distribution of interest, although our approach introduces an approximation. The main case study of paper IV is a double-well potential stochastic differential equation (DWPSDE) model for protein-folding data (reaction coordinate data)
Improving the Accuracy of Marginal Approximations in Likelihood-Free Inference via Localisation
Likelihood-free methods are an essential tool for performing inference for
implicit models which can be simulated from, but for which the corresponding
likelihood is intractable. However, common likelihood-free methods do not scale
well to a large number of model parameters. A promising approach to
high-dimensional likelihood-free inference involves estimating low-dimensional
marginal posteriors by conditioning only on summary statistics believed to be
informative for the low-dimensional component, and then combining the
low-dimensional approximations in some way. In this paper, we demonstrate that
such low-dimensional approximations can be surprisingly poor in practice for
seemingly intuitive summary statistic choices. We describe an idealized
low-dimensional summary statistic that is, in principle, suitable for marginal
estimation. However, a direct approximation of the idealized choice is
difficult in practice. We thus suggest an alternative approach to marginal
estimation which is easier to implement and automate. Given an initial choice
of low-dimensional summary statistic that might only be informative about a
marginal posterior location, the new method improves performance by first
crudely localising the posterior approximation using all the summary statistics
to ensure global identifiability, followed by a second step that hones in on an
accurate low-dimensional approximation using the low-dimensional summary
statistic. We show that the posterior this approach targets can be represented
as a logarithmic pool of posterior distributions based on the low-dimensional
and full summary statistics, respectively. The good performance of our method
is illustrated in several examples.Comment: 30 pages, 9 figure
Advances in Simulation-Based Inference: Towards the automation of the Scientific Method through Learning Algorithms
This dissertation presents several novel techniques and guidelines to advance the field of simulation-based inference. Simulation-based inference, or likelihood-free inference, refers to the process of statistical inference whenever simulating synthetic realizations x through detailed descriptions of their generating processes is possible, but evaluating the likelihood p(x | y) of parameters y tied to realizations x is intractable. What this effectively means is that while it is relatively simple to execute a computer simulation and collect samples from its generative process for various inputs y, it is rather difficult to invert the process where one poses the question: ``what set of parameters y could have been responsible producing x and what is their probability of doing that``
The likelihood p(x | y) plays a central role in answering this question. However, for most scientific simulators, the direct evaluation of the (true and unknown) likelihood involves solving an inverse problem that rests on the integration of all possible forward realizations implicitly defined by the computer code of the simulator. This issue is the core reason why it is typically impossible to evaluate the likelihood model of a computer simulator: it requires us to integrate across all possible code paths for all inputs y that could have potentially led to the realization x.
Classical statistical inference based on the likelihood is for this reason impractical. Nevertheless, approximate inference remains possible by relying on surrogates that produce estimates of key quantities necessary for statistical inference. This thesis introduces various techniques and guidelines to effectively construct such surrogates and demonstrates how these approximations should be applied reliably. We explicitly make the point that the dogma of data efficiency should not be central to the field. Rather, reliable approximations should if we ever are to deduce scientific results with the techniques we developed over the years. This point is strengthened by demonstrating that all techniques can produce approximations that are not reliable from a scientific point of view, that is, when one is interested in constraining parameters or models. We argue for novel protocols that provide theoretically backed reliability properties. To that end, this thesis introduces a novel algorithm that provides such guarantees in terms of the binary classifier. In fact, the theoretical result is applicable to any binary classification problem.
Finally, these contributions are framed within the context of the automation of science. This thesis concerned itself with the automation of the last step of the scientific method, which is described as a recurrence over the sequence hypothesis, experiment, and conclusion. For the most part, the steps of hypothesis formation and experiment design remain however solely for the scientists to decide. Only occasionally are they explored, designed and automated through computer-assisted means. For these two steps, we provide research avenues and proof of concepts that could unlock their automation
Stein’s Method Meets Computational Statistics: A Review of Some Recent Developments
peer reviewedStein’s method compares probability distributions through the study of a class of linear operators called Stein operators.While mainly studied in probability and used to underpin theoretical statistics, Stein’s method has led to significant advances in computational statistics in recent years. The goal of this survey is to bring together some of these recent developments, and in doing so, to stimulate further research into the successful field of Stein’s method and statistics. The topics we discuss include tools to benchmark and compare sampling methods such as approximate Markov chain Monte Carlo, deterministic alternatives to sampling methods, control variate techniques, parameter estimation and goodness-of-fit testin
Advances in scalable learning and sampling of unnormalised models
We study probabilistic models that are known incompletely, up to an intractable normalising constant. To reap the full benefit of such models, two
tasks must be solved: learning and sampling. These two tasks have been
subject to decades of research, and yet significant challenges still persist.
Traditional approaches often suffer from poor scalability with respect to
dimensionality and model-complexity, generally rendering them inapplicable to models parameterised by deep neural networks. In this thesis, we
contribute a new set of methods for addressing this scalability problem.
We first explore the problem of learning unnormalised models. Our investigation begins with a well-known learning principle, Noise-contrastive
Estimation, whose underlying mechanism is that of density-ratio estimation.
By examining why existing density-ratio estimators scale poorly, we identify a new framework, telescoping density-ratio estimation (TRE), that can
learn ratios between highly dissimilar densities in high-dimensional spaces.
Our experiments demonstrate that TRE not only yields substantial improvements for the learning of deep unnormalised models, but can do the
same for a broader set of tasks including mutual information estimation and
representation learning.
Subsequently, we explore the problem of sampling unnormalised models.
A large literature on Markov chain Monte Carlo (MCMC) can be leveraged here, and in continuous domains, gradient-based samplers such as
Metropolis-adjusted Langevin algorithm (MALA) and Hamiltonian Monte
Carlo are excellent options. However, there has been substantially less
progress in MCMC for discrete domains. To advance this subfield, we introduce several discrete Metropolis-Hastings samplers that are conceptually
inspired by MALA, and demonstrate their strong empirical performance
across a range of challenging sampling tasks