26,664 research outputs found
On Multivariate Records from Random Vectors with Independent Components
Let be independent copies of a
random vector with values in and with a
continuous distribution function. The random vector is a
complete record, if each of its components is a record. As we require
to have independent components, crucial results for univariate
records clearly carry over. But there are substantial differences as well:
While there are infinitely many records in case , there occur only
finitely many in the series if . Consequently, there is a terminal
complete record with probability one. We compute the distribution of the random
total number of complete records and investigate the distribution of the
terminal record. For complete records, the sequence of waiting times forms a
Markov chain, but differently from the univariate case, now the state infinity
is an absorbing element of the state space
A hierarchical Bayesian approach to record linkage and population size problems
We propose and illustrate a hierarchical Bayesian approach for matching
statistical records observed on different occasions. We show how this model can
be profitably adopted both in record linkage problems and in capture--recapture
setups, where the size of a finite population is the real object of interest.
There are at least two important differences between the proposed model-based
approach and the current practice in record linkage. First, the statistical
model is built up on the actually observed categorical variables and no
reduction (to 0--1 comparisons) of the available information takes place.
Second, the hierarchical structure of the model allows a two-way propagation of
the uncertainty between the parameter estimation step and the matching
procedure so that no plug-in estimates are used and the correct uncertainty is
accounted for both in estimating the population size and in performing the
record linkage. We illustrate and motivate our proposal through a real data
example and simulations.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS447 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Hierarchically nested factor model from multivariate data
We show how to achieve a statistical description of the hierarchical
structure of a multivariate data set. Specifically we show that the similarity
matrix resulting from a hierarchical clustering procedure is the correlation
matrix of a factor model, the hierarchically nested factor model. In this
model, factors are mutually independent and hierarchically organized. Finally,
we use a bootstrap based procedure to reduce the number of factors in the model
with the aim of retaining only those factors significantly robust with respect
to the statistical uncertainty due to the finite length of data records.Comment: 7 pages, 5 figures; accepted for publication in Europhys. Lett. ; the
Appendix corresponds to the additional material of the accepted letter
A randomness test for functional panels
Functional panels are collections of functional time series, and arise often
in the study of high frequency multivariate data. We develop a portmanteau
style test to determine if the cross-sections of such a panel are independent
and identically distributed. Our framework allows the number of functional
projections and/or the number of time series to grow with the sample size. A
large sample justification is based on a new central limit theorem for random
vectors of increasing dimension. With a proper normalization, the limit is
standard normal, potentially making this result easily applicable in other FDA
context in which projections on a subspace of increasing dimension are used.
The test is shown to have correct size and excellent power using simulated
panels whose random structure mimics the realistic dependence encountered in
real panel data. It is expected to find application in climatology, finance,
ecology, economics, and geophysics. We apply it to Southern Pacific sea surface
temperature data, precipitation patterns in the South-West United States, and
temperature curves in Germany.Comment: Supplemental material from the authors' homepage or upon reques
The effects of estimation of censoring, truncation, transformation and partial data vectors
The purpose of this research was to attack statistical problems concerning the estimation of distributions for purposes of predicting and measuring assembly performance as it appears in biological and physical situations. Various statistical procedures were proposed to attack problems of this sort, that is, to produce the statistical distributions of the outcomes of biological and physical situations which, employ characteristics measured on constituent parts. The techniques are described
Record statistics in random vectors and quantum chaos
The record statistics of complex random states are analytically calculated,
and shown that the probability of a record intensity is a Bernoulli process.
The correlation due to normalization leads to a probability distribution of the
records that is non-universal but tends to the Gumbel distribution
asymptotically. The quantum standard map is used to study these statistics for
the effect of correlations apart from normalization. It is seen that in the
mixed phase space regime the number of intensity records is a power law in the
dimensionality of the state as opposed to the logarithmic growth for random
states.Comment: figures redrawn, discussion adde
- …