40,705 research outputs found
Fast, Exact Bootstrap Principal Component Analysis for p>1 million
Many have suggested a bootstrap procedure for estimating the sampling
variability of principal component analysis (PCA) results. However, when the
number of measurements per subject () is much larger than the number of
subjects (), the challenge of calculating and storing the leading principal
components from each bootstrap sample can be computationally infeasible. To
address this, we outline methods for fast, exact calculation of bootstrap
principal components, eigenvalues, and scores. Our methods leverage the fact
that all bootstrap samples occupy the same -dimensional subspace as the
original sample. As a result, all bootstrap principal components are limited to
the same -dimensional subspace and can be efficiently represented by their
low dimensional coordinates in that subspace. Several uncertainty metrics can
be computed solely based on the bootstrap distribution of these low dimensional
coordinates, without calculating or storing the -dimensional bootstrap
components. Fast bootstrap PCA is applied to a dataset of sleep
electroencephalogram (EEG) recordings (, ), and to a dataset of
brain magnetic resonance images (MRIs) ( 3 million, ). For the
brain MRI dataset, our method allows for standard errors for the first 3
principal components based on 1000 bootstrap samples to be calculated on a
standard laptop in 47 minutes, as opposed to approximately 4 days with standard
methods.Comment: 25 pages, including 9 figures and link to R package. 2014-05-14
update: final formatting edits for journal submission, condensed figure
Statistical unfolding of elementary particle spectra: Empirical Bayes estimation and bias-corrected uncertainty quantification
We consider the high energy physics unfolding problem where the goal is to
estimate the spectrum of elementary particles given observations distorted by
the limited resolution of a particle detector. This important statistical
inverse problem arising in data analysis at the Large Hadron Collider at CERN
consists in estimating the intensity function of an indirectly observed Poisson
point process. Unfolding typically proceeds in two steps: one first produces a
regularized point estimate of the unknown intensity and then uses the
variability of this estimator to form frequentist confidence intervals that
quantify the uncertainty of the solution. In this paper, we propose forming the
point estimate using empirical Bayes estimation which enables a data-driven
choice of the regularization strength through marginal maximum likelihood
estimation. Observing that neither Bayesian credible intervals nor standard
bootstrap confidence intervals succeed in achieving good frequentist coverage
in this problem due to the inherent bias of the regularized point estimate, we
introduce an iteratively bias-corrected bootstrap technique for constructing
improved confidence intervals. We show using simulations that this enables us
to achieve nearly nominal frequentist coverage with only a modest increase in
interval length. The proposed methodology is applied to unfolding the boson
invariant mass spectrum as measured in the CMS experiment at the Large Hadron
Collider.Comment: Published at http://dx.doi.org/10.1214/15-AOAS857 in the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org). arXiv admin note:
substantial text overlap with arXiv:1401.827
Tie-respecting bootstrap methods for estimating distributions of sets and functions of eigenvalues
Bootstrap methods are widely used for distribution estimation, although in
some problems they are applicable only with difficulty. A case in point is that
of estimating the distributions of eigenvalue estimators, or of functions of
those estimators, when one or more of the true eigenvalues are tied. The
-out-of- bootstrap can be used to deal with problems of this general
type, but it is very sensitive to the choice of . In this paper we propose a
new approach, where a tie diagnostic is used to determine the locations of
ties, and parameter estimates are adjusted accordingly. Our tie diagnostic is
governed by a probability level, , which in principle is an analogue of
in the -out-of- bootstrap. However, the tie-respecting bootstrap
(TRB) is remarkably robust against the choice of . This makes the TRB
significantly more attractive than the -out-of- bootstrap, where the
value of has substantial influence on the final result. The TRB can be used
very generally; for example, to test hypotheses about, or construct confidence
regions for, the proportion of variability explained by a set of principal
components. It is suitable for both finite-dimensional data and functional
data.Comment: Published in at http://dx.doi.org/10.3150/08-BEJ154 the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
The RAVE Survey: Constraining the Local Galactic Escape Speed
We report new constraints on the local escape speed of our Galaxy. Our
analysis is based on a sample of high velocity stars from the RAVE survey and
two previously published datasets. We use cosmological simulations of disk
galaxy formation to motivate our assumptions on the shape of the velocity
distribution, allowing for a significantly more precise measurement of the
escape velocity compared to previous studies. We find that the escape velocity
lies within the range 498\kms < \ve < 608 \kms (90 per cent confidence), with
a median likelihood of 544\kms. The fact that \ve^2 is significantly
greater than 2\vc^2 (where \vc=220\kms is the local circular velocity)
implies that there must be a significant amount of mass exterior to the Solar
circle, i.e. this convincingly demonstrates the presence of a dark halo in the
Galaxy. For a simple isothermal halo, one can calculate that the minimum radial
extent is kpc. We use our constraints on \ve to determine the mass
of the Milky Way halo for three halo profiles. For example, an adiabatically
contracted NFW halo model results in a virial mass of
and virial radius of
kpc (90 per cent confidence). For this model the circular
velocity at the virial radius is 142^{+31}_{-21}\kms. Although our halo
masses are model dependent, we find that they are in good agreement with each
other.Comment: 19 pages, 9 figures, MNRAS (accepted). v2 incorporates minor cosmetic
revisions which have no effect on the results or conclusion
Estimation of Stress-Strength model in the Generalized Linear Failure Rate Distribution
In this paper, we study the estimation of , also so-called the
stress-strength model, when both and are two independent random
variables with the generalized linear failure rate distributions, under
different assumptions about their parameters. We address the maximum likelihood
estimator (MLE) of and the associated asymptotic confidence interval. In
addition, we compute the MLE and the corresponding Bootstrap confidence
interval when the sample sizes are small. The Bayes estimates of and the
associated credible intervals are also investigated. An extensive computer
simulation is implemented to compare the performances of the proposed
estimators. Eventually, we briefly study the estimation of this model when the
data obtained from both distributions are progressively type-II censored. We
present the MLE and the corresponding confidence interval under three different
progressive censoring schemes. We also analysis a set of real data for
illustrative purpose.Comment: 31 pages, 2 figures, preprin
- …