503 research outputs found
Convergence of latent mixing measures in finite and infinite mixture models
This paper studies convergence behavior of latent mixing measures that arise
in finite and infinite mixture models, using transportation distances (i.e.,
Wasserstein metrics). The relationship between Wasserstein distances on the
space of mixing measures and f-divergence functionals such as Hellinger and
Kullback-Leibler distances on the space of mixture distributions is
investigated in detail using various identifiability conditions. Convergence in
Wasserstein metrics for discrete measures implies convergence of individual
atoms that provide support for the measures, thereby providing a natural
interpretation of convergence of clusters in clustering applications where
mixture models are typically employed. Convergence rates of posterior
distributions for latent mixing measures are established, for both finite
mixtures of multivariate distributions and infinite mixtures based on the
Dirichlet process.Comment: Published in at http://dx.doi.org/10.1214/12-AOS1065 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Optimality in weighted Lâ‚‚-Wasserstein goodness-of-fit statistics
In Del Barrio, Cuesta-Albertos, Matran and Rodriguez-Rodriguez (1999) and Del Barrio, Cuesta-Albertos and Matran (2000), the authors introduced a new class of goodness-of-fit statistics based on the L₂-Wasserstein distance. It was shown that the desirable property of loss of degrees-of-freedom holds only under normality. Furthermore, these statistics have some limitations in their applicability to heavier-tailed distributions. To overcome these problems, the use of weight functions in the statistics was proposed and investigated by De Wet (2000), De Wet (2002) and Csörgő (2002). In the former the issue of loss of degrees-of-freedom was considered and in the latter the application to heavier-tailed distributions. In De Wet (2000) and De Wet (2002) it was shown how the weight functions could be chosen in order to retain the loss of degrees-of-freedom property separately for location and scale families. The weight functions that give this property, are the ones that give asymptotically optimal estimators for respectively the location and scale parameters – thus estimation optimality. In this paper we show that in the location case, this choice of “estimation optimal” weight function also gives “testing optimality”, where the latter is measured in terms of approximate Bahadur efficiencies
Modeling Persistent Trends in Distributions
We present a nonparametric framework to model a short sequence of probability
distributions that vary both due to underlying effects of sequential
progression and confounding noise. To distinguish between these two types of
variation and estimate the sequential-progression effects, our approach
leverages an assumption that these effects follow a persistent trend. This work
is motivated by the recent rise of single-cell RNA-sequencing experiments over
a brief time course, which aim to identify genes relevant to the progression of
a particular biological process across diverse cell populations. While
classical statistical tools focus on scalar-response regression or
order-agnostic differences between distributions, it is desirable in this
setting to consider both the full distributions as well as the structure
imposed by their ordering. We introduce a new regression model for ordinal
covariates where responses are univariate distributions and the underlying
relationship reflects consistent changes in the distributions over increasing
levels of the covariate. This concept is formalized as a "trend" in
distributions, which we define as an evolution that is linear under the
Wasserstein metric. Implemented via a fast alternating projections algorithm,
our method exhibits numerous strengths in simulations and analyses of
single-cell gene expression data.Comment: To appear in: Journal of the American Statistical Associatio
Learning Generative Models with Sinkhorn Divergences
The ability to compare two degenerate probability distributions (i.e. two
probability distributions supported on two distinct low-dimensional manifolds
living in a much higher-dimensional space) is a crucial problem arising in the
estimation of generative models for high-dimensional observations such as those
arising in computer vision or natural language. It is known that optimal
transport metrics can represent a cure for this problem, since they were
specifically designed as an alternative to information divergences to handle
such problematic scenarios. Unfortunately, training generative machines using
OT raises formidable computational and statistical challenges, because of (i)
the computational burden of evaluating OT losses, (ii) the instability and lack
of smoothness of these losses, (iii) the difficulty to estimate robustly these
losses and their gradients in high dimension. This paper presents the first
tractable computational method to train large scale generative models using an
optimal transport loss, and tackles these three issues by relying on two key
ideas: (a) entropic smoothing, which turns the original OT loss into one that
can be computed using Sinkhorn fixed point iterations; (b) algorithmic
(automatic) differentiation of these iterations. These two approximations
result in a robust and differentiable approximation of the OT loss with
streamlined GPU execution. Entropic smoothing generates a family of losses
interpolating between Wasserstein (OT) and Maximum Mean Discrepancy (MMD), thus
allowing to find a sweet spot leveraging the geometry of OT and the favorable
high-dimensional sample complexity of MMD which comes with unbiased gradient
estimates. The resulting computational architecture complements nicely standard
deep network generative models by a stack of extra layers implementing the loss
function
Two-sample Test using Projected Wasserstein Distance: Breaking the Curse of Dimensionality
We develop a projected Wasserstein distance for the two-sample test, a
fundamental problem in statistics and machine learning: given two sets of
samples, to determine whether they are from the same distribution. In
particular, we aim to circumvent the curse of dimensionality in Wasserstein
distance: when the dimension is high, it has diminishing testing power, which
is inherently due to the slow concentration property of Wasserstein metrics in
the high dimension space. A key contribution is to couple optimal projection to
find the low dimensional linear mapping to maximize the Wasserstein distance
between projected probability distributions. We characterize the theoretical
property of the finite-sample convergence rate on IPMs and present practical
algorithms for computing this metric. Numerical examples validate our
theoretical results.Comment: 10 pages, 3 figures. Accepted in ISIT-2
Recommended from our members
Statistical Recovery of Discrete, Geometric and Invariant Structures
The main objective of the workshop was to bring together researchers in mathematical statistics and related areas in order to discuss recent advances and problems associated with statistical recovery of geometric and invariant structures. Topics include adaptive estimation, confidence sets and testing techniques, as well as statistical algorithms for geometrical structure recovery and data analysis
- …