42 research outputs found

    Some models are useful, but how do we know which ones? Towards a unified Bayesian model taxonomy

    Full text link
    Probabilistic (Bayesian) modeling has experienced a surge of applications in almost all quantitative sciences and industrial areas. This development is driven by a combination of several factors, including better probabilistic estimation algorithms, flexible software, increased computing power, and a growing awareness of the benefits of probabilistic learning. However, a principled Bayesian model building workflow is far from complete and many challenges remain. To aid future research and applications of a principled Bayesian workflow, we ask and provide answers for what we perceive as two fundamental questions of Bayesian modeling, namely (a) "What actually is a Bayesian model?" and (b) "What makes a good Bayesian model?". As an answer to the first question, we propose the PAD model taxonomy that defines four basic kinds of Bayesian models, each representing some combination of the assumed joint distribution of all (known or unknown) variables (P), a posterior approximator (A), and training data (D). As an answer to the second question, we propose ten utility dimensions according to which we can evaluate Bayesian models holistically, namely, (1) causal consistency, (2) parameter recoverability, (3) predictive performance, (4) fairness, (5) structural faithfulness, (6) parsimony, (7) interpretability, (8) convergence, (9) estimation speed, and (10) robustness. Further, we propose two example utility decision trees that describe hierarchies and trade-offs between utilities depending on the inferential goals that drive model building and testing

    Group equivariant neural posterior estimation

    Get PDF
    Simulation-based inference with conditional neural density estimators is a powerful approach to solving inverse problems in science. However, these methods typically treat the underlying forward model as a black box, with no way to exploit geometric properties such as equivariances. Equivariances are common in scientific models, however integrating them directly into expressive inference networks (such as normalizing flows) is not straightforward. We here describe an alternative method to incorporate equivariances under joint transformations of parameters and data. Our method -- called group equivariant neural posterior estimation (GNPE) -- is based on self-consistently standardizing the "pose" of the data while estimating the posterior over parameters. It is architecture-independent, and applies both to exact and approximate equivariances. As a real-world application, we use GNPE for amortized inference of astrophysical binary black hole systems from gravitational-wave observations. We show that GNPE achieves state-of-the-art accuracy while reducing inference times by three orders of magnitude

    Group equivariant neural posterior estimation

    Full text link
    Simulation-based inference with conditional neural density estimators is a powerful approach to solving inverse problems in science. However, these methods typically treat the underlying forward model as a black box, with no way to exploit geometric properties such as equivariances. Equivariances are common in scientific models, however integrating them directly into expressive inference networks (such as normalizing flows) is not straightforward. We here describe an alternative method to incorporate equivariances under joint transformations of parameters and data. Our method -- called group equivariant neural posterior estimation (GNPE) -- is based on self-consistently standardizing the "pose" of the data while estimating the posterior over parameters. It is architecture-independent, and applies both to exact and approximate equivariances. As a real-world application, we use GNPE for amortized inference of astrophysical binary black hole systems from gravitational-wave observations. We show that GNPE achieves state-of-the-art accuracy while reducing inference times by three orders of magnitude.Comment: 13+11 pages, 5+8 figure

    Simulation-based Inference : From Approximate Bayesian Computation and Particle Methods to Neural Density Estimation

    Get PDF
    This doctoral thesis in computational statistics utilizes both Monte Carlo methods(approximate Bayesian computation and sequential Monte Carlo) and machine­-learning methods (deep learning and normalizing flows) to develop novel algorithms for infer­ence in implicit Bayesian models. Implicit models are those for which calculating the likelihood function is very challenging (and often impossible), but model simulation is feasible. The inference methods developed in the thesis are simulation­-based infer­ence methods since they leverage the possibility to simulate data from the implicit models. Several approaches are considered in the thesis: Paper II and IV focus on classical methods (sequential Monte Carlo­-based methods), while paper I and III fo­cus on more recent machine learning methods (deep learning and normalizing flows, respectively).Paper I constructs novel deep learning methods for learning summary statistics for approximate Bayesian computation (ABC). To achieve this paper I introduces the partially exchangeable network (PEN), a deep learning architecture specifically de­signed for Markovian data (i.e., partially exchangeable data).Paper II considers Bayesian inference in stochastic differential equation mixed-effects models (SDEMEM). Bayesian inference for SDEMEMs is challenging due to the intractable likelihood function of SDEMEMs. Paper II addresses this problem by designing a novel a Gibbs­-blocking strategy in combination with correlated pseudo­ marginal methods. The paper also discusses how custom particle filters can be adapted to the inference procedure.Paper III introduces the novel inference method sequential neural posterior and like­lihood approximation (SNPLA). SNPLA is a simulation­-based inference algorithm that utilizes normalizing flows for learning both the posterior distribution and the likelihood function of an implicit model via a sequential scheme. By learning both the likelihood and the posterior, and by leveraging the reverse Kullback Leibler (KL) divergence, SNPLA avoids ad­-hoc correction steps and Markov chain Monte Carlo (MCMC) sampling.Paper IV introduces the accelerated-delayed acceptance (ADA) algorithm. ADA can be viewed as an extension of the delayed­-acceptance (DA) MCMC algorithm that leverages connections between the two likelihood ratios of DA to further accelerate MCMC sampling from the posterior distribution of interest, although our approach introduces an approximation. The main case study of paper IV is a double­-well po­tential stochastic differential equation (DWP­SDE) model for protein-­folding data (reaction coordinate data)

    Improving the Accuracy of Marginal Approximations in Likelihood-Free Inference via Localisation

    Full text link
    Likelihood-free methods are an essential tool for performing inference for implicit models which can be simulated from, but for which the corresponding likelihood is intractable. However, common likelihood-free methods do not scale well to a large number of model parameters. A promising approach to high-dimensional likelihood-free inference involves estimating low-dimensional marginal posteriors by conditioning only on summary statistics believed to be informative for the low-dimensional component, and then combining the low-dimensional approximations in some way. In this paper, we demonstrate that such low-dimensional approximations can be surprisingly poor in practice for seemingly intuitive summary statistic choices. We describe an idealized low-dimensional summary statistic that is, in principle, suitable for marginal estimation. However, a direct approximation of the idealized choice is difficult in practice. We thus suggest an alternative approach to marginal estimation which is easier to implement and automate. Given an initial choice of low-dimensional summary statistic that might only be informative about a marginal posterior location, the new method improves performance by first crudely localising the posterior approximation using all the summary statistics to ensure global identifiability, followed by a second step that hones in on an accurate low-dimensional approximation using the low-dimensional summary statistic. We show that the posterior this approach targets can be represented as a logarithmic pool of posterior distributions based on the low-dimensional and full summary statistics, respectively. The good performance of our method is illustrated in several examples.Comment: 30 pages, 9 figure

    Advances in Simulation-Based Inference: Towards the automation of the Scientific Method through Learning Algorithms

    Full text link
    This dissertation presents several novel techniques and guidelines to advance the field of simulation-based inference. Simulation-based inference, or likelihood-free inference, refers to the process of statistical inference whenever simulating synthetic realizations x through detailed descriptions of their generating processes is possible, but evaluating the likelihood p(x | y) of parameters y tied to realizations x is intractable. What this effectively means is that while it is relatively simple to execute a computer simulation and collect samples from its generative process for various inputs y, it is rather difficult to invert the process where one poses the question: ``what set of parameters y could have been responsible producing x and what is their probability of doing that`` The likelihood p(x | y) plays a central role in answering this question. However, for most scientific simulators, the direct evaluation of the (true and unknown) likelihood involves solving an inverse problem that rests on the integration of all possible forward realizations implicitly defined by the computer code of the simulator. This issue is the core reason why it is typically impossible to evaluate the likelihood model of a computer simulator: it requires us to integrate across all possible code paths for all inputs y that could have potentially led to the realization x. Classical statistical inference based on the likelihood is for this reason impractical. Nevertheless, approximate inference remains possible by relying on surrogates that produce estimates of key quantities necessary for statistical inference. This thesis introduces various techniques and guidelines to effectively construct such surrogates and demonstrates how these approximations should be applied reliably. We explicitly make the point that the dogma of data efficiency should not be central to the field. Rather, reliable approximations should if we ever are to deduce scientific results with the techniques we developed over the years. This point is strengthened by demonstrating that all techniques can produce approximations that are not reliable from a scientific point of view, that is, when one is interested in constraining parameters or models. We argue for novel protocols that provide theoretically backed reliability properties. To that end, this thesis introduces a novel algorithm that provides such guarantees in terms of the binary classifier. In fact, the theoretical result is applicable to any binary classification problem. Finally, these contributions are framed within the context of the automation of science. This thesis concerned itself with the automation of the last step of the scientific method, which is described as a recurrence over the sequence hypothesis, experiment, and conclusion. For the most part, the steps of hypothesis formation and experiment design remain however solely for the scientists to decide. Only occasionally are they explored, designed and automated through computer-assisted means. For these two steps, we provide research avenues and proof of concepts that could unlock their automation

    Stein’s Method Meets Computational Statistics: A Review of Some Recent Developments

    Get PDF
    peer reviewedStein’s method compares probability distributions through the study of a class of linear operators called Stein operators.While mainly studied in probability and used to underpin theoretical statistics, Stein’s method has led to significant advances in computational statistics in recent years. The goal of this survey is to bring together some of these recent developments, and in doing so, to stimulate further research into the successful field of Stein’s method and statistics. The topics we discuss include tools to benchmark and compare sampling methods such as approximate Markov chain Monte Carlo, deterministic alternatives to sampling methods, control variate techniques, parameter estimation and goodness-of-fit testin

    Advances in scalable learning and sampling of unnormalised models

    Get PDF
    We study probabilistic models that are known incompletely, up to an intractable normalising constant. To reap the full benefit of such models, two tasks must be solved: learning and sampling. These two tasks have been subject to decades of research, and yet significant challenges still persist. Traditional approaches often suffer from poor scalability with respect to dimensionality and model-complexity, generally rendering them inapplicable to models parameterised by deep neural networks. In this thesis, we contribute a new set of methods for addressing this scalability problem. We first explore the problem of learning unnormalised models. Our investigation begins with a well-known learning principle, Noise-contrastive Estimation, whose underlying mechanism is that of density-ratio estimation. By examining why existing density-ratio estimators scale poorly, we identify a new framework, telescoping density-ratio estimation (TRE), that can learn ratios between highly dissimilar densities in high-dimensional spaces. Our experiments demonstrate that TRE not only yields substantial improvements for the learning of deep unnormalised models, but can do the same for a broader set of tasks including mutual information estimation and representation learning. Subsequently, we explore the problem of sampling unnormalised models. A large literature on Markov chain Monte Carlo (MCMC) can be leveraged here, and in continuous domains, gradient-based samplers such as Metropolis-adjusted Langevin algorithm (MALA) and Hamiltonian Monte Carlo are excellent options. However, there has been substantially less progress in MCMC for discrete domains. To advance this subfield, we introduce several discrete Metropolis-Hastings samplers that are conceptually inspired by MALA, and demonstrate their strong empirical performance across a range of challenging sampling tasks
    corecore