6,423 research outputs found
ABC-CDE: Towards Approximate Bayesian Computation with Complex High-Dimensional Data and Limited Simulations
Approximate Bayesian Computation (ABC) is typically used when the likelihood
is either unavailable or intractable but where data can be simulated under
different parameter settings using a forward model. Despite the recent interest
in ABC, high-dimensional data and costly simulations still remain a bottleneck
in some applications. There is also no consensus as to how to best assess the
performance of such methods without knowing the true posterior. We show how a
nonparametric conditional density estimation (CDE) framework, which we refer to
as ABC-CDE, help address three nontrivial challenges in ABC: (i) how to
efficiently estimate the posterior distribution with limited simulations and
different types of data, (ii) how to tune and compare the performance of ABC
and related methods in estimating the posterior itself, rather than just
certain properties of the density, and (iii) how to efficiently choose among a
large set of summary statistics based on a CDE surrogate loss. We provide
theoretical and empirical evidence that justify ABC-CDE procedures that {\em
directly} estimate and assess the posterior based on an initial ABC sample, and
we describe settings where standard ABC and regression-based approaches are
inadequate
High-Dimensional Density Ratio Estimation with Extensions to Approximate Likelihood Computation
The ratio between two probability density functions is an important component
of various tasks, including selection bias correction, novelty detection and
classification. Recently, several estimators of this ratio have been proposed.
Most of these methods fail if the sample space is high-dimensional, and hence
require a dimension reduction step, the result of which can be a significant
loss of information. Here we propose a simple-to-implement, fully nonparametric
density ratio estimator that expands the ratio in terms of the eigenfunctions
of a kernel-based operator; these functions reflect the underlying geometry of
the data (e.g., submanifold structure), often leading to better estimates
without an explicit dimension reduction step. We show how our general framework
can be extended to address another important problem, the estimation of a
likelihood function in situations where that function cannot be
well-approximated by an analytical form. One is often faced with this situation
when performing statistical inference with data from the sciences, due the
complexity of the data and of the processes that generated those data. We
emphasize applications where using existing likelihood-free methods of
inference would be challenging due to the high dimensionality of the sample
space, but where our spectral series method yields a reasonable estimate of the
likelihood function. We provide theoretical guarantees and illustrate the
effectiveness of our proposed method with numerical experiments.Comment: With supplementary materia
Global and Local Two-Sample Tests via Regression
Two-sample testing is a fundamental problem in statistics. Despite its long
history, there has been renewed interest in this problem with the advent of
high-dimensional and complex data. Specifically, in the machine learning
literature, there have been recent methodological developments such as
classification accuracy tests. The goal of this work is to present a regression
approach to comparing multivariate distributions of complex data. Depending on
the chosen regression model, our framework can efficiently handle different
types of variables and various structures in the data, with competitive power
under many practical scenarios. Whereas previous work has been largely limited
to global tests which conceal much of the local information, our approach
naturally leads to a local two-sample testing framework in which we identify
local differences between multivariate distributions with statistical
confidence. We demonstrate the efficacy of our approach both theoretically and
empirically, under some well-known parametric and nonparametric regression
methods. Our proposed methods are applied to simulated data as well as a
challenging astronomy data set to assess their practical usefulness
- …
