244 research outputs found
On the Decreasing Power of Kernel and Distance based Nonparametric Hypothesis Tests in High Dimensions
This paper is about two related decision theoretic problems, nonparametric
two-sample testing and independence testing. There is a belief that two
recently proposed solutions, based on kernels and distances between pairs of
points, behave well in high-dimensional settings. We identify different sources
of misconception that give rise to the above belief. Specifically, we
differentiate the hardness of estimation of test statistics from the hardness
of testing whether these statistics are zero or not, and explicitly discuss a
notion of "fair" alternative hypotheses for these problems as dimension
increases. We then demonstrate that the power of these tests actually drops
polynomially with increasing dimension against fair alternatives. We end with
some theoretical insights and shed light on the \textit{median heuristic} for
kernel bandwidth selection. Our work advances the current understanding of the
power of modern nonparametric hypothesis tests in high dimensions.Comment: 19 pages, 9 figures, published in AAAI-15: The 29th AAAI Conference
on Artificial Intelligence (with author order reversed from ArXiv
On the High-dimensional Power of Linear-time Kernel Two-Sample Testing under Mean-difference Alternatives
Nonparametric two sample testing deals with the question of consistently
deciding if two distributions are different, given samples from both, without
making any parametric assumptions about the form of the distributions. The
current literature is split into two kinds of tests - those which are
consistent without any assumptions about how the distributions may differ
(\textit{general} alternatives), and those which are designed to specifically
test easier alternatives, like a difference in means (\textit{mean-shift}
alternatives).
The main contribution of this paper is to explicitly characterize the power
of a popular nonparametric two sample test, designed for general alternatives,
under a mean-shift alternative in the high-dimensional setting. Specifically,
we explicitly derive the power of the linear-time Maximum Mean Discrepancy
statistic using the Gaussian kernel, where the dimension and sample size can
both tend to infinity at any rate, and the two distributions differ in their
means. As a corollary, we find that if the signal-to-noise ratio is held
constant, then the test's power goes to one if the number of samples increases
faster than the dimension increases. This is the first explicit power
derivation for a general nonparametric test in the high-dimensional setting,
and also the first analysis of how tests designed for general alternatives
perform when faced with easier ones.Comment: 25 pages, 5 figure
Large-Scale Kernel Methods for Independence Testing
Representations of probability measures in reproducing kernel Hilbert spaces
provide a flexible framework for fully nonparametric hypothesis tests of
independence, which can capture any type of departure from independence,
including nonlinear associations and multivariate interactions. However, these
approaches come with an at least quadratic computational cost in the number of
observations, which can be prohibitive in many applications. Arguably, it is
exactly in such large-scale datasets that capturing any type of dependence is
of interest, so striking a favourable tradeoff between computational efficiency
and test performance for kernel independence tests would have a direct impact
on their applicability in practice. In this contribution, we provide an
extensive study of the use of large-scale kernel approximations in the context
of independence testing, contrasting block-based, Nystrom and random Fourier
feature approaches. Through a variety of synthetic data experiments, it is
demonstrated that our novel large scale methods give comparable performance
with existing methods whilst using significantly less computation time and
memory.Comment: 29 pages, 6 figure
Implicit Langevin Algorithms for Sampling From Log-concave Densities
For sampling from a log-concave density, we study implicit integrators
resulting from -method discretization of the overdamped Langevin
diffusion stochastic differential equation. Theoretical and algorithmic
properties of the resulting sampling methods for and a
range of step sizes are established. Our results generalize and extend prior
works in several directions. In particular, for , we prove
geometric ergodicity and stability of the resulting methods for all step sizes.
We show that obtaining subsequent samples amounts to solving a strongly-convex
optimization problem, which is readily achievable using one of numerous
existing methods. Numerical examples supporting our theoretical analysis are
also presented
Sketching for Large-Scale Learning of Mixture Models
Learning parameters from voluminous data can be prohibitive in terms of
memory and computational requirements. We propose a "compressive learning"
framework where we estimate model parameters from a sketch of the training
data. This sketch is a collection of generalized moments of the underlying
probability distribution of the data. It can be computed in a single pass on
the training set, and is easily computable on streams or distributed datasets.
The proposed framework shares similarities with compressive sensing, which aims
at drastically reducing the dimension of high-dimensional signals while
preserving the ability to reconstruct them. To perform the estimation task, we
derive an iterative algorithm analogous to sparse reconstruction algorithms in
the context of linear inverse problems. We exemplify our framework with the
compressive estimation of a Gaussian Mixture Model (GMM), providing heuristics
on the choice of the sketching procedure and theoretical guarantees of
reconstruction. We experimentally show on synthetic data that the proposed
algorithm yields results comparable to the classical Expectation-Maximization
(EM) technique while requiring significantly less memory and fewer computations
when the number of database elements is large. We further demonstrate the
potential of the approach on real large-scale data (over 10 8 training samples)
for the task of model-based speaker verification. Finally, we draw some
connections between the proposed framework and approximate Hilbert space
embedding of probability distributions using random features. We show that the
proposed sketching operator can be seen as an innovative method to design
translation-invariant kernels adapted to the analysis of GMMs. We also use this
theoretical framework to derive information preservation guarantees, in the
spirit of infinite-dimensional compressive sensing
Variable Selection in Maximum Mean Discrepancy for Interpretable Distribution Comparison
Two-sample testing decides whether two datasets are generated from the same
distribution. This paper studies variable selection for two-sample testing, the
task being to identify the variables (or dimensions) responsible for the
discrepancies between the two distributions. This task is relevant to many
problems of pattern analysis and machine learning, such as dataset shift
adaptation, causal inference and model validation. Our approach is based on a
two-sample test based on the Maximum Mean Discrepancy (MMD). We optimise the
Automatic Relevance Detection (ARD) weights defined for individual variables to
maximise the power of the MMD-based test. For this optimisation, we introduce
sparse regularisation and propose two methods for dealing with the issue of
selecting an appropriate regularisation parameter. One method determines the
regularisation parameter in a data-driven way, and the other aggregates the
results of different regularisation parameters. We confirm the validity of the
proposed methods by systematic comparisons with baseline methods, and
demonstrate their usefulness in exploratory analysis of high-dimensional
traffic simulation data. Preliminary theoretical analyses are also provided,
including a rigorous definition of variable selection for two-sample testing
- âŠ