8 research outputs found

    Approximate Bayesian computation via the energy statistic

    Get PDF
    Approximate Bayesian computation (ABC) has become an essential part of the Bayesian toolbox for addressing problems in which the likelihood is prohibitively expensive or entirely unknown, making it intractable. ABC defines a pseudo-posterior by comparing observed data with simulated data, traditionally based on some summary statistics, the elicitation of which is regarded as a key difficulty. Recently, using data discrepancy measures has been proposed in order to bypass the construction of summary statistics. Here we propose to use the importance-sampling ABC (IS-ABC) algorithm relying on the so-called two-sample energy statistic. We establish a new asymptotic result for the case where both the observed sample size and the simulated data sample size increase to infinity, which highlights to what extent the data discrepancy measure impacts the asymptotic pseudo-posterior. The result holds in the broad setting of IS-ABC methodologies, thus generalizing previous results that have been established only for rejection ABC algorithms. Furthermore, we propose a consistent V-statistic estimator of the energy statistic, under which we show that the large sample result holds, and prove that the rejection ABC algorithm, based on the energy statistic, generates pseudo-posterior distributions that achieves convergence to the correct limits, when implemented with rejection thresholds that converge to zero, in the finite sample setting. Our proposed energy statistic based ABC algorithm is demonstrated on a variety of models, including a Gaussian mixture, a moving-average model of order two, a bivariate beta and a multivariate gg-and-kk distribution. We find that our proposed method compares well with alternative discrepancy measures.Comment: 25 pages, 6 figures, 5 table

    Discrepancy-based Inference for Intractable Generative Models using Quasi-Monte Carlo

    Get PDF
    Intractable generative models are models for which the likelihood is unavailable but sampling is possible. Most approaches to parameter inference in this setting require the computation of some discrepancy between the data and the generative model. This is for example the case for minimum distance estimation and approximate Bayesian computation. These approaches require sampling a high number of realisations from the model for different parameter values, which can be a significant challenge when simulating is an expensive operation. In this paper, we propose to enhance this approach by enforcing "sample diversity" in simulations of our models. This will be implemented through the use of quasi-Monte Carlo (QMC) point sets. Our key results are sample complexity bounds which demonstrate that, under smoothness conditions on the generator, QMC can significantly reduce the number of samples required to obtain a given level of accuracy when using three of the most common discrepancies: the maximum mean discrepancy, the Wasserstein distance, and the Sinkhorn divergence. This is complemented by a simulation study which highlights that an improved accuracy is sometimes also possible in some settings which are not covered by the theory.Comment: minor presentation changes and updated reference

    Statistical inference in generative models using scoring rules

    Get PDF
    Statistical models which allow generating simulations without providing access to the density of the distribution are called simulator models. They are commonly developed by scientists to represent natural phenomena and depend on physically meaningful parameters. Analogously, generative networks produce samples from a probability distribution by transforming draws from a noise (or latent) distribution via a neural network; as for simulator models, the density is unavailable. These two frameworks, developed independently from different communities, can be grouped into the class of generative models; compared to statistical models that explicitly specify the density, they are more powerful and flexible. For generative networks, typically, a single point estimate for the parameters (or weights) is obtained by minimizing an objective function through gradient descent enabled by automatic differentiation. In contrast, for simulator models, samples from a probability distribution for the parameters are usually obtained via some statistical algorithm. Nevertheless, in both cases, the inference methods rely on common principles that exploit simulations. In this thesis, I follow the principle of assessing how a probabilistic model matches an observation by Scoring Rules. This generalises common statistical practices based on the density function and, with specific Scoring Rules, allows tackling generative models. After a detailed introduction and literature review in Chapter 1, the first part of this thesis (Chapters 2 and 3) is concerned with methods to infer probability distributions for the parameters of simulator models. Specifically, Chapter 2 contributes to the traditional Bayesian Likelihood-Free Inference literature with a new way to learn summary statistics, defined as the sufficient statistics of the best exponential family approximation to the simulator model. In contrast, Chapter 3 departs from tradition by defining a new posterior distribution based on the generalised Bayesian inference framework, rather than motivated as an approximation to thestandard posterior. The posterior is defined through Scoring Rules computable for simulator models and is robust to outliers. In the second part of the thesis (Chapters 4 and 5), I study Scoring Rule Minimization to determine the weights of generative networks; for specific choices of Scoring Rules, this approach better captures the variability of the data than popular alternatives. I apply generative networks trained in this way to uncertainty-sensitive tasks: in Chapter 4 I use them to provide a probability distribution over the parameters of simulator models, thus falling back to the theme of Chapters 2 and 3; instead, in Chapter 5, I consider probabilistic forecasting, also establishing consistency of the training objective with dependent training data. Finally, I conclude in Chapter 6 with some final thoughts and directions for future work

    Approximate Bayesian Computation Via the Energy Statistic

    No full text

    Approximate Bayesian computation via the energy statistic

    Get PDF
    International audienceApproximate Bayesian computation (ABC) has become an essential part of the Bayesian toolbox for addressing problems in which the likelihood is prohibitively expensive or entirely unknown, making it intractable. ABC defines a quasi-posterior by comparing observed data with simulated data, traditionally based on some summary statistics, the elicitation of which is regarded as a key difficulty. In recent years, a number of data discrepancy measures bypassing the construction of summary statistics have been proposed, including the Kullback-Leibler divergence, the Wasserstein distance and maximum mean discrepancies. Here we propose a novel importance-sampling (IS) ABC algorithm relying on the so-called two-sample energy statistic. We establish a new asymptotic result for the case where both the observed sample size and the simulated data sample size increase to infinity, which highlights to what extent the data discrepancy measure impacts the asymptotic pseudo-posterior. The result holds in the broad setting of IS-ABC methodologies, thus generalizing previous results that have been established only for rejection ABC algorithms. Furthermore, we propose a consistent V-statistic estimator of the energy statistic, under which we show that the large sample result holds. Our proposed energy statistic based ABC algorithm is demonstrated on a variety of models, including a Gaussian mixture, a moving-average model of order two, a bivariate beta and a multivariate g-and-k distribution. We find that our proposed method compares well with alternative discrepancy measures

    Approximate Bayesian computation via the energy statistic

    No full text
    Approximate Bayesian computation (ABC) has become an essential part of the Bayesian toolbox for addressing problems in which the likelihood is prohibitively expensive or entirely unknown, making it intractable. ABC defines a quasi-posterior by comparing observed data with simulated data, traditionally based on some summary statistics, the elicitation of which is regarded as a key difficulty. In recent years, a number of data discrepancy measures bypassing the construction of summary statistics have been proposed, including the Kullback-Leibler divergence, the Wasserstein distance and maximum mean discrepancies. Here we propose a novel importance-sampling (IS) ABC algorithm relying on the so-called two-sample energy statistic. We establish a new asymptotic result for the case where both the observed sample size and the simulated data sample size increase to infinity, which highlights to what extent the data discrepancy measure impacts the asymptotic pseudo-posterior. The result holds in the broad setting of IS-ABC methodologies, thus generalizing previous results that have been established only for rejection ABC algorithms. Furthermore, we propose a consistent V-statistic estimator of the energy statistic, under which we show that the large sample result holds. Our proposed energy statistic based ABC algorithm is demonstrated on a variety of models, including a Gaussian mixture, a moving-average model of order two, a bivariate beta and a multivariate g-and-k distribution. We find that our proposed method compares well with alternative discrepancy measures
    corecore