124 research outputs found
Amortised likelihood-free inference for expensive time-series simulators with signatured ratio estimation
Simulation models of complex dynamics in the natural and social sciences commonly lack a tractable likelihood function, rendering traditional likelihood-based statistical inference impossible. Recent advances in machine learning have introduced novel algorithms for estimating otherwise intractable likelihood functions using a likelihood ratio trick based on binary classifiers. Consequently, efficient likelihood approximations can be obtained whenever good probabilistic classifiers can be constructed. We propose a kernel classifier for sequential data using path signatures based on the recently introduced signature kernel. We demonstrate that the representative power of signatures yields a highly performant classifier, even in the crucially important case where sample numbers are low. In such scenarios, our approach can outperform sophisticated neural networks for common posterior inference tasks
Bernoulli Race Particle Filters
When the weights in a particle filter are not available analytically,
standard resampling methods cannot be employed. To circumvent this problem
state-of-the-art algorithms replace the true weights with non-negative unbiased
estimates. This algorithm is still valid but at the cost of higher variance of
the resulting filtering estimates in comparison to a particle filter using the
true weights. We propose here a novel algorithm that allows for resampling
according to the true intractable weights when only an unbiased estimator of
the weights is available. We demonstrate our algorithm on several examples.Comment: 19 page
Black-box Bayesian inference for agent-based models
Simulation models, in particular agent-based models, are gaining popularity in economics and the social sciences. The considerable flexibility they offer, as well as their capacity to reproduce a variety of empirically observed behaviours of complex systems, give them broad appeal, and the increasing availability of cheap computing power has made their use feasible. Yet a widespread adoption in real-world modelling and decision-making scenarios has been hindered by the difficulty of performing parameter estimation for such models. In general, simulation models lack a tractable likelihood function, which precludes a straightforward application of standard statistical inference techniques. A number of recent works have sought to address this problem through the application of likelihood-free inference techniques, in which parameter estimates are determined by performing some form of comparison between the observed data and simulation output. However, these approaches are (a) founded on restrictive assumptions, and/or (b) typically require many hundreds of thousands of simulations. These qualities make them unsuitable for large-scale simulations in economics and the social sciences, and can cast doubt on the validity of these inference methods in such scenarios. In this paper, we investigate the efficacy of two classes of simulation-efficient black-box approximate Bayesian inference methods that have recently drawn significant attention within the probabilistic machine learning community: neural posterior estimation and neural density ratio estimation. We present a number of benchmarking experiments in which we demonstrate that neural network-based black-box methods provide state of the art parameter inference for economic simulation models, and crucially are compatible with generic multivariate or even non-Euclidean time-series data. In addition, we suggest appropriate assessment criteria for use in future benchmarking of approximate Bayesian inference procedures for simulation models in economics and the social sciences
Multivariate kernel density estimation applied to sensitive geo-referenced administrative data protected via measurement error
Modern systems of official statistics require the timely estimation of area-
specific densities of sub-populations. Ideally estimates should be based on
precise geo-coded information, which is not available due to confidentiality
constraints. One approach for ensuring confidentiality is by rounding the geo-
coordinates. We propose multivariate non-parametric kernel density estimation
that reverses the rounding process by using a Bayesian measurement error
model. The methodology is applied to the Berlin register of residents for
deriving density estimates of ethnic minorities and aged people. Estimates are
used for identifying areas with a need for new advisory centres for migrants
and infrastructure for older people
Large Sample Asymptotics of the Pseudo-Marginal Method
The pseudo-marginal algorithm is a variant of the Metropolis--Hastings
algorithm which samples asymptotically from a probability distribution when it
is only possible to estimate unbiasedly an unnormalized version of its density.
Practically, one has to trade-off the computational resources used to obtain
this estimator against the asymptotic variances of the ergodic averages
obtained by the pseudo-marginal algorithm. Recent works optimizing this
trade-off rely on some strong assumptions which can cast doubts over their
practical relevance. In particular, they all assume that the distribution of
the difference between the log-density and its estimate is independent of the
parameter value at which it is evaluated. Under regularity conditions we show
here that, as the number of data points tends to infinity, a space-rescaled
version of the pseudo-marginal chain converges weakly towards another
pseudo-marginal chain for which this assumption indeed holds. A study of this
limiting chain allows us to provide parameter dimension-dependent guidelines on
how to optimally scale a normal random walk proposal and the number of Monte
Carlo samples for the pseudo-marginal method in the large-sample regime. This
complements and validates currently available results.Comment: 76 pages, 3 figure
- …