1 research outputs found
Kernel Hypothesis Testing with Set-valued Data
We present a general framework for hypothesis testing on distributions of
sets of individual examples. Sets may represent many common data sources such
as groups of observations in time series, collections of words in text or a
batch of images of a given phenomenon. This observation pattern, however,
differs from the common assumptions required for hypothesis testing: each set
differs in size, may have differing levels of noise, and also may incorporate
nuisance variability, irrelevant for the analysis of the phenomenon of
interest; all features that bias test decisions if not accounted for. In this
paper, we propose to interpret sets as independent samples from a collection of
latent probability distributions, and introduce kernel two-sample and
independence tests in this latent space of distributions. We prove the
consistency of tests and observe them to outperform in a wide range of
synthetic experiments. Finally, we showcase their use in practice with
experiments of healthcare and climate data, where previously heuristics were
needed for feature extraction and testing