Search CORE

222,139 research outputs found

Distance Dependent Chinese Restaurant Processes

Author: Blei David M.
Frazier Peter I.
Publication venue
Publication date: 01/01/2010
Field of study

We develop the distance dependent Chinese restaurant process (CRP), a flexible class of distributions over partitions that allows for non-exchangeability. This class can be used to model many kinds of dependencies between data in infinite clustering models, including dependencies across time or space. We examine the properties of the distance dependent CRP, discuss its connections to Bayesian nonparametric mixture models, and derive a Gibbs sampler for both observed and mixture settings. We study its performance with three text corpora. We show that relaxing the assumption of exchangeability with distance dependent CRPs can provide a better fit to sequential data. We also show its alternative formulation of the traditional CRP leads to a faster-mixing Gibbs sampling algorithm than the one based on the original formulation

arXiv.org e-Print Archive

CiteSeerX

Bayes and maximum likelihood for $L^1$ -Wasserstein deconvolution of Laplace mixtures

Author: Scricciolo Catia
Publication venue
Publication date: 17/08/2017
Field of study

We consider the problem of recovering a distribution function on the real line from observations additively contaminated with errors following the standard Laplace distribution. Assuming that the latent distribution is completely unknown leads to a nonparametric deconvolution problem. We begin by studying the rates of convergence relative to the

L^2

-norm and the Hellinger metric for the direct problem of estimating the sampling density, which is a mixture of Laplace densities with a possibly unbounded set of locations: the rate of convergence for the Bayes' density estimator corresponding to a Dirichlet process prior over the space of all mixing distributions on the real line matches, up to a logarithmic factor, with the

n^{-3/8}\log^{1/8}n

rate for the maximum likelihood estimator. Then, appealing to an inversion inequality translating the

L^2

-norm and the Hellinger distance between general kernel mixtures, with a kernel density having polynomially decaying Fourier transform, into any

L^p

-Wasserstein distance,

p\geq1

, between the corresponding mixing distributions, provided their Laplace transforms are finite in some neighborhood of zero, we derive the rates of convergence in the

L^1

-Wasserstein metric for the Bayes' and maximum likelihood estimators of the mixing distribution. Merging in the

L^1

-Wasserstein distance between Bayes and maximum likelihood follows as a by-product, along with an assessment on the stochastic order of the discrepancy between the two estimation procedures

arXiv.org e-Print Archive

Catalogo dei prodotti della ricerca

Compact convex sets of the plane and probability theory

Author: Marckert Jean-François
Renault David
Publication venue
Publication date: 01/01/2014
Field of study

The Gauss-Minkowski correspondence in

\mathbb{R}^2

states the existence of a homeomorphism between the probability measures

\mu

[0,2\pi]

such that

\int_0^{2\pi} e^{ix}d\mu(x)=0

and the compact convex sets (CCS) of the plane with perimeter~1. In this article, we bring out explicit formulas relating the border of a CCS to its probability measure. As a consequence, we show that some natural operations on CCS -- for example, the Minkowski sum -- have natural translations in terms of probability measure operations, and reciprocally, the convolution of measures translates into a new notion of convolution of CCS. Additionally, we give a proof that a polygonal curve associated with a sample of

n

random variables (satisfying

\int_0^{2\pi} e^{ix}d\mu(x)=0

) converges to a CCS associated with

\mu

at speed

\sqrt{n}

, a result much similar to the convergence of the empirical process in statistics. Finally, we employ this correspondence to present models of smooth random CCS and simulations

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

Numérisation de Documents Anciens Mathématiques

Basic statistics for probabilistic symbolic variables: a novel metric-based approach

Author: Irpino Antonio
Verde Rosanna
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/12/2013
Field of study

In data mining, it is usually to describe a set of individuals using some summaries (means, standard deviations, histograms, confidence intervals) that generalize individual descriptions into a typology description. In this case, data can be described by several values. In this paper, we propose an approach for computing basic statics for such data, and, in particular, for data described by numerical multi-valued variables (interval, histograms, discrete multi-valued descriptions). We propose to treat all numerical multi-valued variables as distributional data, i.e. as individuals described by distributions. To obtain new basic statistics for measuring the variability and the association between such variables, we extend the classic measure of inertia, calculated with the Euclidean distance, using the squared Wasserstein distance defined between probability measures. The distance is a generalization of the Wasserstein distance, that is a distance between quantile functions of two distributions. Some properties of such a distance are shown. Among them, we prove the Huygens theorem of decomposition of the inertia. We show the use of the Wasserstein distance and of the basic statistics presenting a k-means like clustering algorithm, for the clustering of a set of data described by modal numerical variables (distributional variables), on a real data set. Keywords: Wasserstein distance, inertia, dependence, distributional data, modal variables.Comment: 19 pages, 3 figure

arXiv.org e-Print Archive

Archivio Istituzionale della Ricerca - Università degli Studi della Campania "Luigi Vanvitelli"