9 research outputs found

    A Kernel Test of Goodness of Fit

    Get PDF
    We propose a nonparametric statistical test for goodness-of-fit: given a set of samples, the test determines how likely it is that these were generated from a target density function. The measure of goodness-of-fit is a divergence constructed via Stein's method using functions from a Reproducing Kernel Hilbert Space. Our test statistic is based on an empirical estimate of this divergence, taking the form of a V-statistic in terms of the log gradients of the target density and the kernel. We derive a statistical test, both for i.i.d. and non-i.i.d. samples, where we estimate the null distribution quantiles using a wild bootstrap procedure. We apply our test to quantifying convergence of approximate Markov Chain Monte Carlo methods, statistical model criticism, and evaluating quality of fit vs model complexity in nonparametric density estimation

    Distinguishing distributions with interpretable features

    Get PDF

    Distinguishing distributions with interpretable features

    Get PDF
    Two semimetrics on probability distributions are proposed, based on a difference between features chosen from each, where these features can be in either the spatial or Fourier domains. The features are chosen so as to maximize the distinguishability of the distributions, by optimizing a lower bound of power for a statistical test using these features. The result is a parsimonious and interpretable indication of how and where two distributions differ, which can be used even in high dimensions, and when the difference is localized in the Fourier domain. A real-world benchmark image data demonstrates that the returned features provide a meaningful and informative indication as to how the distributions diffe

    Distinguishing distributions with interpretable features

    Get PDF
    Two semimetrics on probability distributions are proposed, based on a difference between features chosen from each, where these features can be in either the spatial or Fourier domains. The features are chosen so as to maximize the distinguishability of the distributions, by optimizing a lower bound of power for a statistical test using these features. The result is a parsimonious and interpretable indication of how and where two distributions differ, which can be used even in high dimensions, and when the difference is localized in the Fourier domain. A real-world benchmark image data demonstrates that the returned features provide a meaningful and informative indication as to how the distributions diffe
    corecore