49 research outputs found
K2-ABC: Approximate Bayesian Computation with Kernel Embeddings
Complicated generative models often result in a situation where computing the
likelihood of observed data is intractable, while simulating from the
conditional density given a parameter value is relatively easy. Approximate
Bayesian Computation (ABC) is a paradigm that enables simulation-based
posterior inference in such cases by measuring the similarity between simulated
and observed data in terms of a chosen set of summary statistics. However,
there is no general rule to construct sufficient summary statistics for complex
models. Insufficient summary statistics will "leak" information, which leads to
ABC algorithms yielding samples from an incorrect (partial) posterior. In this
paper, we propose a fully nonparametric ABC paradigm which circumvents the need
for manually selecting summary statistics. Our approach, K2-ABC, uses maximum
mean discrepancy (MMD) as a dissimilarity measure between the distributions
over observed and simulated data. MMD is easily estimated as the squared
difference between their empirical kernel embeddings. Experiments on a
simulated scenario and a real-world biological problem illustrate the
effectiveness of the proposed algorithm
Kernel-based distribution features for statistical tests and Bayesian inference
The kernel mean embedding is known to provide a data representation which preserves full information of the data distribution. While typically computationally costly, its nonparametric nature has an advantage of requiring no explicit model specification of the data. At the other extreme are approaches which summarize data distributions into a finite-dimensional vector of hand-picked summary statistics. This explicit finite-dimensional representation offers a computationally cheaper alternative. Clearly, there is a trade-off between cost and sufficiency of the representation, and it is of interest to have a computationally efficient technique which can produce a data-driven representation, thus combining the advantages from both extremes. The main focus of this thesis is on the development of linear-time mean-embedding-based methods to automatically extract informative features of data distributions, for statistical tests and Bayesian inference. In the first part on statistical tests, several new linear-time techniques are developed. These include a new kernel-based distance measure for distributions, a new linear-time nonparametric dependence measure, and a linear-time discrepancy measure between a probabilistic model and a sample, based on a Stein operator. These new measures give rise to linear-time and consistent tests of homogeneity, independence, and goodness of fit, respectively. The key idea behind these new tests is to explicitly learn distribution-characterizing feature vectors, by maximizing a proxy for the probability of correctly rejecting the null hypothesis. We theoretically show that these new tests are consistent for any finite number of features. In the second part, we explore the use of random Fourier features to construct approximate kernel mean embeddings, for representing messages in expectation propagation (EP) algorithm. The goal is to learn a message operator which predicts EP outgoing messages from incoming messages. We derive a novel two-layer random feature representation of the input messages, allowing online learning of the operator during EP inference
Interpretable Distribution Features with Maximum Testing Power
Two semimetrics on probability distributions are proposed, given as the sum
of differences of expectations of analytic functions evaluated at spatial or
frequency locations (i.e, features). The features are chosen so as to maximize
the distinguishability of the distributions, by optimizing a lower bound on
test power for a statistical test using these features. The result is a
parsimonious and interpretable indication of how and where two distributions
differ locally. An empirical estimate of the test power criterion converges
with increasing sample size, ensuring the quality of the returned features. In
real-world benchmarks on high-dimensional text and image data, linear-time
tests using the proposed semimetrics achieve comparable performance to the
state-of-the-art quadratic-time maximum mean discrepancy test, while returning
human-interpretable features that explain the test results
Testing Goodness of Fit of Conditional Density Models with Kernels
We propose two nonparametric statistical tests of goodness of fit for
conditional distributions: given a conditional probability density function
and a joint sample, decide whether the sample is drawn from
for some density . Our tests, formulated with a Stein
operator, can be applied to any differentiable conditional density model, and
require no knowledge of the normalizing constant. We show that 1) our tests are
consistent against any fixed alternative conditional model; 2) the statistics
can be estimated easily, requiring no density estimation as an intermediate
step; and 3) our second test offers an interpretable test result providing
insight on where the conditional model does not fit well in the domain of the
covariate. We demonstrate the interpretability of our test on a task of
modeling the distribution of New York City's taxi drop-off location given a
pick-up point. To our knowledge, our work is the first to propose such
conditional goodness-of-fit tests that simultaneously have all these desirable
properties.Comment: In UAI 2020. http://auai.org/uai2020/accepted.ph
Kernel Conditional Moment Test via Maximum Moment Restriction
We propose a new family of specification tests called kernel conditional
moment (KCM) tests. Our tests are built on a novel representation of
conditional moment restrictions in a reproducing kernel Hilbert space (RKHS)
called conditional moment embedding (CMME). After transforming the conditional
moment restrictions into a continuum of unconditional counterparts, the test
statistic is defined as the maximum moment restriction (MMR) within the unit
ball of the RKHS. We show that the MMR not only fully characterizes the
original conditional moment restrictions, leading to consistency in both
hypothesis testing and parameter estimation, but also has an analytic
expression that is easy to compute as well as closed-form asymptotic
distributions. Our empirical studies show that the KCM test has a promising
finite-sample performance compared to existing tests.Comment: In Proceedings of the 36th Conference on Uncertainty in Artificial
Intelligence (UAI2020
A Linear-Time Kernel Goodness-of-Fit Test
We propose a novel adaptive test of goodness-of-fit, with computational cost
linear in the number of samples. We learn the test features that best indicate
the differences between observed samples and a reference model, by minimizing
the false negative rate. These features are constructed via Stein's method,
meaning that it is not necessary to compute the normalising constant of the
model. We analyse the asymptotic Bahadur efficiency of the new test, and prove
that under a mean-shift alternative, our test always has greater relative
efficiency than a previous linear-time kernel test, regardless of the choice of
parameters for that test. In experiments, the performance of our method exceeds
that of the earlier linear-time test, and matches or exceeds the power of a
quadratic-time kernel test. In high dimensions and where model structure may be
exploited, our goodness of fit test performs far better than a quadratic-time
two-sample test based on the Maximum Mean Discrepancy, with samples drawn from
the model.Comment: Accepted to NIPS 201