29 research outputs found
Asymptotic normality for the counting process of weak records and \delta-records in discrete models
Let be a sequence of independent and identically distributed
random variables, taking non-negative integer values, and call a
-record if , where is an
integer constant. We use martingale arguments to show that the counting process
of -records among the first observations, suitably centered and
scaled, is asymptotically normally distributed for . In particular,
taking we obtain a central limit theorem for the number of weak
records.Comment: Published at http://dx.doi.org/10.3150/07-BEJ6027 in the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
Geodesic PCA in the Wasserstein space
We introduce the method of Geodesic Principal Component Analysis (GPCA) on
the space of probability measures on the line, with finite second moment,
endowed with the Wasserstein metric. We discuss the advantages of this
approach, over a standard functional PCA of probability densities in the
Hilbert space of square-integrable functions. We establish the consistency of
the method by showing that the empirical GPCA converges to its population
counterpart, as the sample size tends to infinity. A key property in the study
of GPCA is the isometry between the Wasserstein space and a closed convex
subset of the space of square-integrable functions, with respect to an
appropriate measure. Therefore, we consider the general problem of PCA in a
closed convex subset of a separable Hilbert space, which serves as basis for
the analysis of GPCA and also has interest in its own right. We provide
illustrative examples on simple statistical models, to show the benefits of
this approach for data analysis. The method is also applied to a real dataset
of population pyramids
Geometric PCA of Images
We describe a method for analyzing the principal modes of geometric variability of images. For this purpose, we propose a general framework based on the use of deformation operators for modeling the geometric variability of images around a reference mean pattern. In this setting, we describe a simple
algorithm for estimating the geometric variability of a set of images. Some numerical experiments on real data are proposed for highlighting the benefits of this approach. The consistency of this procedure is also analyzed in statistical deformable models
Extrapolation of Urn Models via Poissonization: Accurate Measurements of the Microbial Unknown
The availability of high-throughput parallel methods for sequencing microbial
communities is increasing our knowledge of the microbial world at an
unprecedented rate. Though most attention has focused on determining
lower-bounds on the alpha-diversity i.e. the total number of different species
present in the environment, tight bounds on this quantity may be highly
uncertain because a small fraction of the environment could be composed of a
vast number of different species. To better assess what remains unknown, we
propose instead to predict the fraction of the environment that belongs to
unsampled classes. Modeling samples as draws with replacement of colored balls
from an urn with an unknown composition, and under the sole assumption that
there are still undiscovered species, we show that conditionally unbiased
predictors and exact prediction intervals (of constant length in logarithmic
scale) are possible for the fraction of the environment that belongs to
unsampled classes. Our predictions are based on a Poissonization argument,
which we have implemented in what we call the Embedding algorithm. In fixed
i.e. non-randomized sample sizes, the algorithm leads to very accurate
predictions on a sub-sample of the original sample. We quantify the effect of
fixed sample sizes on our prediction intervals and test our methods and others
found in the literature against simulated environments, which we devise taking
into account datasets from a human-gut and -hand microbiota. Our methodology
applies to any dataset that can be conceptualized as a sample with replacement
from an urn. In particular, it could be applied, for example, to quantify the
proportion of all the unseen solutions to a binding site problem in a random
RNA pool, or to reassess the surveillance of a certain terrorist group,
predicting the conditional probability that it deploys a new tactic in a next
attack.Comment: 14 pages, 7 figures, 4 table
Near-Record Values in Discrete Random Sequences
Given a sequence (Xn) of random variables, Xn is said to be a near-record if Xn∈(Mn−1−a,Mn−1], where Mn=max{X1,…,Xn} and a>0 is a parameter. We investigate the point process η on [0,∞) of near-record values from an integer-valued, independent and identically distributed sequence, showing that it is a Bernoulli cluster process. We derive the probability generating functional of η and formulas for the expectation, variance and covariance of the counting variables η(A),A⊂[0,∞). We also derive the strong convergence and asymptotic normality of η([0,n]), as n→∞, under mild regularity conditions on the distribution of the observations. For heavy-tailed distributions, with square-summable hazard rates, we prove that η([0,n]) grows to a finite random limit and compute its probability generating function. We present examples of the application of our results to particular distributions, covering a wide range of behaviours in terms of their right tails
A martingale approach to strong convergence in a generalized Pólya-Eggenberger urn model
We obtain strong convergence for the proportion Wn/Tn of white balls in a generalized Pólya--Eggenberger urn scheme. We use straightforward martingale arguments that do not require moment estimations.urn model martingales limit theorems
Embedding in extremal processes and the asymptotic behavior of sums of minima
Limit theorems for sums of minima of positive i.d.d. r.v.'s are obtained by embedding the sequence of maxima in a suitable extremal process.extremal process sums of extremes limit theorems