29 research outputs found

    Asymptotic normality for the counting process of weak records and \delta-records in discrete models

    Full text link
    Let {Xn,n1}\{X_n,n\ge1\} be a sequence of independent and identically distributed random variables, taking non-negative integer values, and call XnX_n a δ\delta-record if Xn>max{X1,...,Xn1}+δX_n>\max\{X_1,...,X_{n-1}\}+\delta, where δ\delta is an integer constant. We use martingale arguments to show that the counting process of δ\delta-records among the first nn observations, suitably centered and scaled, is asymptotically normally distributed for δ0\delta\ne0. In particular, taking δ=1\delta=-1 we obtain a central limit theorem for the number of weak records.Comment: Published at http://dx.doi.org/10.3150/07-BEJ6027 in the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

    Geodesic PCA in the Wasserstein space

    Full text link
    We introduce the method of Geodesic Principal Component Analysis (GPCA) on the space of probability measures on the line, with finite second moment, endowed with the Wasserstein metric. We discuss the advantages of this approach, over a standard functional PCA of probability densities in the Hilbert space of square-integrable functions. We establish the consistency of the method by showing that the empirical GPCA converges to its population counterpart, as the sample size tends to infinity. A key property in the study of GPCA is the isometry between the Wasserstein space and a closed convex subset of the space of square-integrable functions, with respect to an appropriate measure. Therefore, we consider the general problem of PCA in a closed convex subset of a separable Hilbert space, which serves as basis for the analysis of GPCA and also has interest in its own right. We provide illustrative examples on simple statistical models, to show the benefits of this approach for data analysis. The method is also applied to a real dataset of population pyramids

    Geometric PCA of Images

    Get PDF
    We describe a method for analyzing the principal modes of geometric variability of images. For this purpose, we propose a general framework based on the use of deformation operators for modeling the geometric variability of images around a reference mean pattern. In this setting, we describe a simple algorithm for estimating the geometric variability of a set of images. Some numerical experiments on real data are proposed for highlighting the benefits of this approach. The consistency of this procedure is also analyzed in statistical deformable models

    Extrapolation of Urn Models via Poissonization: Accurate Measurements of the Microbial Unknown

    Get PDF
    The availability of high-throughput parallel methods for sequencing microbial communities is increasing our knowledge of the microbial world at an unprecedented rate. Though most attention has focused on determining lower-bounds on the alpha-diversity i.e. the total number of different species present in the environment, tight bounds on this quantity may be highly uncertain because a small fraction of the environment could be composed of a vast number of different species. To better assess what remains unknown, we propose instead to predict the fraction of the environment that belongs to unsampled classes. Modeling samples as draws with replacement of colored balls from an urn with an unknown composition, and under the sole assumption that there are still undiscovered species, we show that conditionally unbiased predictors and exact prediction intervals (of constant length in logarithmic scale) are possible for the fraction of the environment that belongs to unsampled classes. Our predictions are based on a Poissonization argument, which we have implemented in what we call the Embedding algorithm. In fixed i.e. non-randomized sample sizes, the algorithm leads to very accurate predictions on a sub-sample of the original sample. We quantify the effect of fixed sample sizes on our prediction intervals and test our methods and others found in the literature against simulated environments, which we devise taking into account datasets from a human-gut and -hand microbiota. Our methodology applies to any dataset that can be conceptualized as a sample with replacement from an urn. In particular, it could be applied, for example, to quantify the proportion of all the unseen solutions to a binding site problem in a random RNA pool, or to reassess the surveillance of a certain terrorist group, predicting the conditional probability that it deploys a new tactic in a next attack.Comment: 14 pages, 7 figures, 4 table

    Near-Record Values in Discrete Random Sequences

    Get PDF
    Given a sequence (Xn) of random variables, Xn is said to be a near-record if Xn∈(Mn−1−a,Mn−1], where Mn=max{X1,…,Xn} and a>0 is a parameter. We investigate the point process η on [0,∞) of near-record values from an integer-valued, independent and identically distributed sequence, showing that it is a Bernoulli cluster process. We derive the probability generating functional of η and formulas for the expectation, variance and covariance of the counting variables η(A),A⊂[0,∞). We also derive the strong convergence and asymptotic normality of η([0,n]), as n→∞, under mild regularity conditions on the distribution of the observations. For heavy-tailed distributions, with square-summable hazard rates, we prove that η([0,n]) grows to a finite random limit and compute its probability generating function. We present examples of the application of our results to particular distributions, covering a wide range of behaviours in terms of their right tails

    A martingale approach to strong convergence in a generalized Pólya-Eggenberger urn model

    No full text
    We obtain strong convergence for the proportion Wn/Tn of white balls in a generalized Pólya--Eggenberger urn scheme. We use straightforward martingale arguments that do not require moment estimations.urn model martingales limit theorems

    Embedding in extremal processes and the asymptotic behavior of sums of minima

    No full text
    Limit theorems for sums of minima of positive i.d.d. r.v.'s are obtained by embedding the sequence of maxima in a suitable extremal process.extremal process sums of extremes limit theorems
    corecore