5,313 research outputs found
Approximating Likelihood Ratios with Calibrated Discriminative Classifiers
In many fields of science, generalized likelihood ratio tests are established
tools for statistical inference. At the same time, it has become increasingly
common that a simulator (or generative model) is used to describe complex
processes that tie parameters of an underlying theory and measurement
apparatus to high-dimensional observations .
However, simulator often do not provide a way to evaluate the likelihood
function for a given observation , which motivates a new class of
likelihood-free inference algorithms. In this paper, we show that likelihood
ratios are invariant under a specific class of dimensionality reduction maps
. As a direct consequence, we show that
discriminative classifiers can be used to approximate the generalized
likelihood ratio statistic when only a generative model for the data is
available. This leads to a new machine learning-based approach to
likelihood-free inference that is complementary to Approximate Bayesian
Computation, and which does not require a prior on the model parameters.
Experimental results on artificial problems with known exact likelihoods
illustrate the potential of the proposed method.Comment: 35 pages, 5 figure
Robust mixtures in the presence of measurement errors
We develop a mixture-based approach to robust density modeling and outlier
detection for experimental multivariate data that includes measurement error
information. Our model is designed to infer atypical measurements that are not
due to errors, aiming to retrieve potentially interesting peculiar objects.
Since exact inference is not possible in this model, we develop a
tree-structured variational EM solution. This compares favorably against a
fully factorial approximation scheme, approaching the accuracy of a
Markov-Chain-EM, while maintaining computational simplicity. We demonstrate the
benefits of including measurement errors in the model, in terms of improved
outlier detection rates in varying measurement uncertainty conditions. We then
use this approach in detecting peculiar quasars from an astrophysical survey,
given photometric measurements with errors.Comment: (Refereed) Proceedings of the 24-th Annual International Conference
on Machine Learning 2007 (ICML07), (Ed.) Z. Ghahramani. June 20-24, 2007,
Oregon State University, Corvallis, OR, USA, pp. 847-854; Omnipress. ISBN
978-1-59593-793-3; 8 pages, 6 figure
Robust M-Estimation Based Bayesian Cluster Enumeration for Real Elliptically Symmetric Distributions
Robustly determining the optimal number of clusters in a data set is an
essential factor in a wide range of applications. Cluster enumeration becomes
challenging when the true underlying structure in the observed data is
corrupted by heavy-tailed noise and outliers. Recently, Bayesian cluster
enumeration criteria have been derived by formulating cluster enumeration as
maximization of the posterior probability of candidate models. This article
generalizes robust Bayesian cluster enumeration so that it can be used with any
arbitrary Real Elliptically Symmetric (RES) distributed mixture model. Our
framework also covers the case of M-estimators that allow for mixture models,
which are decoupled from a specific probability distribution. Examples of
Huber's and Tukey's M-estimators are discussed. We derive a robust criterion
for data sets with finite sample size, and also provide an asymptotic
approximation to reduce the computational cost at large sample sizes. The
algorithms are applied to simulated and real-world data sets, including
radar-based person identification, and show a significant robustness
improvement in comparison to existing methods
Approximating multivariate posterior distribution functions from Monte Carlo samples for sequential Bayesian inference
An important feature of Bayesian statistics is the opportunity to do
sequential inference: the posterior distribution obtained after seeing a
dataset can be used as prior for a second inference. However, when Monte Carlo
sampling methods are used for inference, we only have a set of samples from the
posterior distribution. To do sequential inference, we then either have to
evaluate the second posterior at only these locations and reweight the samples
accordingly, or we can estimate a functional description of the posterior
probability distribution from the samples and use that as prior for the second
inference. Here, we investigated to what extent we can obtain an accurate joint
posterior from two datasets if the inference is done sequentially rather than
jointly, under the condition that each inference step is done using Monte Carlo
sampling. To test this, we evaluated the accuracy of kernel density estimates,
Gaussian mixtures, vine copulas and Gaussian processes in approximating
posterior distributions, and then tested whether these approximations can be
used in sequential inference. In low dimensionality, Gaussian processes are
more accurate, whereas in higher dimensionality Gaussian mixtures or vine
copulas perform better. In our test cases, posterior approximations are
preferable over direct sample reweighting, although joint inference is still
preferable over sequential inference. Since the performance is case-specific,
we provide an R package mvdens with a unified interface for the density
approximation methods
- …