Nonparametric two sample testing deals with the question of consistently
deciding if two distributions are different, given samples from both, without
making any parametric assumptions about the form of the distributions. The
current literature is split into two kinds of tests - those which are
consistent without any assumptions about how the distributions may differ
(\textit{general} alternatives), and those which are designed to specifically
test easier alternatives, like a difference in means (\textit{mean-shift}
alternatives).
The main contribution of this paper is to explicitly characterize the power
of a popular nonparametric two sample test, designed for general alternatives,
under a mean-shift alternative in the high-dimensional setting. Specifically,
we explicitly derive the power of the linear-time Maximum Mean Discrepancy
statistic using the Gaussian kernel, where the dimension and sample size can
both tend to infinity at any rate, and the two distributions differ in their
means. As a corollary, we find that if the signal-to-noise ratio is held
constant, then the test's power goes to one if the number of samples increases
faster than the dimension increases. This is the first explicit power
derivation for a general nonparametric test in the high-dimensional setting,
and also the first analysis of how tests designed for general alternatives
perform when faced with easier ones.Comment: 25 pages, 5 figure