148 research outputs found
Equivalence of distance-based and RKHS-based statistics in hypothesis testing
We provide a unifying framework linking two classes of statistics used in
two-sample and independence testing: on the one hand, the energy distances and
distance covariances from the statistics literature; on the other, maximum mean
discrepancies (MMD), that is, distances between embeddings of distributions to
reproducing kernel Hilbert spaces (RKHS), as established in machine learning.
In the case where the energy distance is computed with a semimetric of negative
type, a positive definite kernel, termed distance kernel, may be defined such
that the MMD corresponds exactly to the energy distance. Conversely, for any
positive definite kernel, we can interpret the MMD as energy distance with
respect to some negative-type semimetric. This equivalence readily extends to
distance covariance using kernels on the product space. We determine the class
of probability distributions for which the test statistics are consistent
against all alternatives. Finally, we investigate the performance of the family
of distance kernels in two-sample and independence tests: we show in particular
that the energy distance most commonly employed in statistics is just one
member of a parametric family of kernels, and that other choices from this
family can yield more powerful tests.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1140 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
K2-ABC: Approximate Bayesian Computation with Kernel Embeddings
Complicated generative models often result in a situation where computing the
likelihood of observed data is intractable, while simulating from the
conditional density given a parameter value is relatively easy. Approximate
Bayesian Computation (ABC) is a paradigm that enables simulation-based
posterior inference in such cases by measuring the similarity between simulated
and observed data in terms of a chosen set of summary statistics. However,
there is no general rule to construct sufficient summary statistics for complex
models. Insufficient summary statistics will "leak" information, which leads to
ABC algorithms yielding samples from an incorrect (partial) posterior. In this
paper, we propose a fully nonparametric ABC paradigm which circumvents the need
for manually selecting summary statistics. Our approach, K2-ABC, uses maximum
mean discrepancy (MMD) as a dissimilarity measure between the distributions
over observed and simulated data. MMD is easily estimated as the squared
difference between their empirical kernel embeddings. Experiments on a
simulated scenario and a real-world biological problem illustrate the
effectiveness of the proposed algorithm
A low variance consistent test of relative dependency
We describe a novel non-parametric statistical hypothesis test of relative
dependence between a source variable and two candidate target variables. Such a
test enables us to determine whether one source variable is significantly more
dependent on a first target variable or a second. Dependence is measured via
the Hilbert-Schmidt Independence Criterion (HSIC), resulting in a pair of
empirical dependence measures (source-target 1, source-target 2). We test
whether the first dependence measure is significantly larger than the second.
Modeling the covariance between these HSIC statistics leads to a provably more
powerful test than the construction of independent HSIC statistics by
sub-sampling. The resulting test is consistent and unbiased, and (being based
on U-statistics) has favorable convergence properties. The test can be computed
in quadratic time, matching the computational complexity of standard empirical
HSIC estimators. The effectiveness of the test is demonstrated on several
real-world problems: we identify language groups from a multilingual corpus,
and we prove that tumor location is more dependent on gene expression than
chromosomal imbalances. Source code is available for download at
https://github.com/wbounliphone/reldep.Comment: International Conference on Machine Learning, Jul 2015, Lille, Franc
- …