53,823 research outputs found
Distance covariance in metric spaces
We extend the theory of distance (Brownian) covariance from Euclidean spaces,
where it was introduced by Sz\'{e}kely, Rizzo and Bakirov, to general metric
spaces. We show that for testing independence, it is necessary and sufficient
that the metric space be of strong negative type. In particular, we show that
this holds for separable Hilbert spaces, which answers a question of Kosorok.
Instead of the manipulations of Fourier transforms used in the original work,
we use elementary inequalities for metric spaces and embeddings in Hilbert
spaces.Comment: Published in at http://dx.doi.org/10.1214/12-AOP803 the Annals of
Probability (http://www.imstat.org/aop/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Fast Two-Sample Testing with Analytic Representations of Probability Measures
We propose a class of nonparametric two-sample tests with a cost linear in
the sample size. Two tests are given, both based on an ensemble of distances
between analytic functions representing each of the distributions. The first
test uses smoothed empirical characteristic functions to represent the
distributions, the second uses distribution embeddings in a reproducing kernel
Hilbert space. Analyticity implies that differences in the distributions may be
detected almost surely at a finite number of randomly chosen
locations/frequencies. The new tests are consistent against a larger class of
alternatives than the previous linear-time tests based on the (non-smoothed)
empirical characteristic functions, while being much faster than the current
state-of-the-art quadratic-time kernel-based or energy distance-based tests.
Experiments on artificial benchmarks and on challenging real-world testing
problems demonstrate that our tests give a better power/time tradeoff than
competing approaches, and in some cases, better outright power than even the
most expensive quadratic-time tests. This performance advantage is retained
even in high dimensions, and in cases where the difference in distributions is
not observable with low order statistics
Statistical Methods in Topological Data Analysis for Complex, High-Dimensional Data
The utilization of statistical methods an their applications within the new
field of study known as Topological Data Analysis has has tremendous potential
for broadening our exploration and understanding of complex, high-dimensional
data spaces. This paper provides an introductory overview of the mathematical
underpinnings of Topological Data Analysis, the workflow to convert samples of
data to topological summary statistics, and some of the statistical methods
developed for performing inference on these topological summary statistics. The
intention of this non-technical overview is to motivate statisticians who are
interested in learning more about the subject.Comment: 15 pages, 7 Figures, 27th Annual Conference on Applied Statistics in
Agricultur
Metric Semantics and Full Abstractness for Action Refinement and Probabilistic Choice
This paper provides a case-study in the field of metric semantics for probabilistic programming. Both an operational and a denotational semantics are presented for an abstract process language L_pr, which features action refinement and probabilistic choice. The two models are constructed in the setting of complete ultrametric spaces, here based on probability measures of compact support over sequences of actions. It is shown that the standard toolkit for metric semantics works well in the probabilistic context of L_pr, e.g. in establishing the correctness of the denotational semantics with respect to the operational one. In addition, it is shown how the method of proving full abstraction --as proposed recently by the authors for a nondeterministic language with action refinement-- can be adapted to deal with the probabilistic language L_pr as well
- ā¦