40,705 research outputs found

    Fast, Exact Bootstrap Principal Component Analysis for p>1 million

    Full text link
    Many have suggested a bootstrap procedure for estimating the sampling variability of principal component analysis (PCA) results. However, when the number of measurements per subject (pp) is much larger than the number of subjects (nn), the challenge of calculating and storing the leading principal components from each bootstrap sample can be computationally infeasible. To address this, we outline methods for fast, exact calculation of bootstrap principal components, eigenvalues, and scores. Our methods leverage the fact that all bootstrap samples occupy the same nn-dimensional subspace as the original sample. As a result, all bootstrap principal components are limited to the same nn-dimensional subspace and can be efficiently represented by their low dimensional coordinates in that subspace. Several uncertainty metrics can be computed solely based on the bootstrap distribution of these low dimensional coordinates, without calculating or storing the pp-dimensional bootstrap components. Fast bootstrap PCA is applied to a dataset of sleep electroencephalogram (EEG) recordings (p=900p=900, n=392n=392), and to a dataset of brain magnetic resonance images (MRIs) (p≈p\approx 3 million, n=352n=352). For the brain MRI dataset, our method allows for standard errors for the first 3 principal components based on 1000 bootstrap samples to be calculated on a standard laptop in 47 minutes, as opposed to approximately 4 days with standard methods.Comment: 25 pages, including 9 figures and link to R package. 2014-05-14 update: final formatting edits for journal submission, condensed figure

    Statistical unfolding of elementary particle spectra: Empirical Bayes estimation and bias-corrected uncertainty quantification

    Full text link
    We consider the high energy physics unfolding problem where the goal is to estimate the spectrum of elementary particles given observations distorted by the limited resolution of a particle detector. This important statistical inverse problem arising in data analysis at the Large Hadron Collider at CERN consists in estimating the intensity function of an indirectly observed Poisson point process. Unfolding typically proceeds in two steps: one first produces a regularized point estimate of the unknown intensity and then uses the variability of this estimator to form frequentist confidence intervals that quantify the uncertainty of the solution. In this paper, we propose forming the point estimate using empirical Bayes estimation which enables a data-driven choice of the regularization strength through marginal maximum likelihood estimation. Observing that neither Bayesian credible intervals nor standard bootstrap confidence intervals succeed in achieving good frequentist coverage in this problem due to the inherent bias of the regularized point estimate, we introduce an iteratively bias-corrected bootstrap technique for constructing improved confidence intervals. We show using simulations that this enables us to achieve nearly nominal frequentist coverage with only a modest increase in interval length. The proposed methodology is applied to unfolding the ZZ boson invariant mass spectrum as measured in the CMS experiment at the Large Hadron Collider.Comment: Published at http://dx.doi.org/10.1214/15-AOAS857 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org). arXiv admin note: substantial text overlap with arXiv:1401.827

    Tie-respecting bootstrap methods for estimating distributions of sets and functions of eigenvalues

    Full text link
    Bootstrap methods are widely used for distribution estimation, although in some problems they are applicable only with difficulty. A case in point is that of estimating the distributions of eigenvalue estimators, or of functions of those estimators, when one or more of the true eigenvalues are tied. The mm-out-of-nn bootstrap can be used to deal with problems of this general type, but it is very sensitive to the choice of mm. In this paper we propose a new approach, where a tie diagnostic is used to determine the locations of ties, and parameter estimates are adjusted accordingly. Our tie diagnostic is governed by a probability level, β\beta, which in principle is an analogue of mm in the mm-out-of-nn bootstrap. However, the tie-respecting bootstrap (TRB) is remarkably robust against the choice of β\beta. This makes the TRB significantly more attractive than the mm-out-of-nn bootstrap, where the value of mm has substantial influence on the final result. The TRB can be used very generally; for example, to test hypotheses about, or construct confidence regions for, the proportion of variability explained by a set of principal components. It is suitable for both finite-dimensional data and functional data.Comment: Published in at http://dx.doi.org/10.3150/08-BEJ154 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

    The RAVE Survey: Constraining the Local Galactic Escape Speed

    Get PDF
    We report new constraints on the local escape speed of our Galaxy. Our analysis is based on a sample of high velocity stars from the RAVE survey and two previously published datasets. We use cosmological simulations of disk galaxy formation to motivate our assumptions on the shape of the velocity distribution, allowing for a significantly more precise measurement of the escape velocity compared to previous studies. We find that the escape velocity lies within the range 498\kms < \ve < 608 \kms (90 per cent confidence), with a median likelihood of 544\kms. The fact that \ve^2 is significantly greater than 2\vc^2 (where \vc=220\kms is the local circular velocity) implies that there must be a significant amount of mass exterior to the Solar circle, i.e. this convincingly demonstrates the presence of a dark halo in the Galaxy. For a simple isothermal halo, one can calculate that the minimum radial extent is ∼58\sim58 kpc. We use our constraints on \ve to determine the mass of the Milky Way halo for three halo profiles. For example, an adiabatically contracted NFW halo model results in a virial mass of 1.42−0.54+1.14×1012M⊙1.42^{+1.14}_{-0.54}\times10^{12}M_\odot and virial radius of 305−45+66305^{+66}_{-45} kpc (90 per cent confidence). For this model the circular velocity at the virial radius is 142^{+31}_{-21}\kms. Although our halo masses are model dependent, we find that they are in good agreement with each other.Comment: 19 pages, 9 figures, MNRAS (accepted). v2 incorporates minor cosmetic revisions which have no effect on the results or conclusion

    Estimation of Stress-Strength model in the Generalized Linear Failure Rate Distribution

    Full text link
    In this paper, we study the estimation of R=P[Y<X]R=P [Y < X ], also so-called the stress-strength model, when both XX and YY are two independent random variables with the generalized linear failure rate distributions, under different assumptions about their parameters. We address the maximum likelihood estimator (MLE) of RR and the associated asymptotic confidence interval. In addition, we compute the MLE and the corresponding Bootstrap confidence interval when the sample sizes are small. The Bayes estimates of RR and the associated credible intervals are also investigated. An extensive computer simulation is implemented to compare the performances of the proposed estimators. Eventually, we briefly study the estimation of this model when the data obtained from both distributions are progressively type-II censored. We present the MLE and the corresponding confidence interval under three different progressive censoring schemes. We also analysis a set of real data for illustrative purpose.Comment: 31 pages, 2 figures, preprin
    • …
    corecore