Maximum mean discrepancy (MMD) refers to a general class of nonparametric
two-sample tests that are based on maximizing the mean difference over samples
from one distribution P versus another Q, over all choices of data
transformations f living in some function space F. Inspired by
recent work that connects what are known as functions of Radon bounded variation (RBV) and neural networks (Parhi and Nowak, 2021, 2023), we study
the MMD defined by taking F to be the unit ball in the RBV space of
a given smoothness order k≥0. This test, which we refer to as the
Radon-Kolmogorov-Smirnov (RKS) test, can be viewed as a
generalization of the well-known and classical Kolmogorov-Smirnov (KS) test to
multiple dimensions and higher orders of smoothness. It is also intimately
connected to neural networks: we prove that the witness in the RKS test -- the
function f achieving the maximum mean difference -- is always a ridge spline
of degree k, i.e., a single neuron in a neural network. This allows us to
leverage the power of modern deep learning toolkits to (approximately) optimize
the criterion that underlies the RKS test. We prove that the RKS test has
asymptotically full power at distinguishing any distinct pair Pî€ =Q of
distributions, derive its asymptotic null distribution, and carry out extensive
experiments to elucidate the strengths and weakenesses of the RKS test versus
the more traditional kernel MMD test