Search CORE

40,705 research outputs found

Fast, Exact Bootstrap Principal Component Analysis for p>1 million

Author: Caffo Brian
Fisher Aaron
Schwartz Brian
Zipunnikov Vadim
Publication venue
Publication date: 14/05/2014
Field of study

Many have suggested a bootstrap procedure for estimating the sampling variability of principal component analysis (PCA) results. However, when the number of measurements per subject (

p

) is much larger than the number of subjects (

n

), the challenge of calculating and storing the leading principal components from each bootstrap sample can be computationally infeasible. To address this, we outline methods for fast, exact calculation of bootstrap principal components, eigenvalues, and scores. Our methods leverage the fact that all bootstrap samples occupy the same

n

-dimensional subspace as the original sample. As a result, all bootstrap principal components are limited to the same

n

-dimensional subspace and can be efficiently represented by their low dimensional coordinates in that subspace. Several uncertainty metrics can be computed solely based on the bootstrap distribution of these low dimensional coordinates, without calculating or storing the

p

-dimensional bootstrap components. Fast bootstrap PCA is applied to a dataset of sleep electroencephalogram (EEG) recordings (

p=900

n=392

), and to a dataset of brain magnetic resonance images (MRIs) (

p\approx

3 million,

n=352

). For the brain MRI dataset, our method allows for standard errors for the first 3 principal components based on 1000 bootstrap samples to be calculated on a standard laptop in 47 minutes, as opposed to approximately 4 days with standard methods.Comment: 25 pages, including 9 figures and link to R package. 2014-05-14 update: final formatting edits for journal submission, condensed figure

arXiv.org e-Print Archive

CiteSeerX

Statistical unfolding of elementary particle spectra: Empirical Bayes estimation and bias-corrected uncertainty quantification

Author: Kuusela Mikael
Panaretos Victor M.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 17/11/2015
Field of study

We consider the high energy physics unfolding problem where the goal is to estimate the spectrum of elementary particles given observations distorted by the limited resolution of a particle detector. This important statistical inverse problem arising in data analysis at the Large Hadron Collider at CERN consists in estimating the intensity function of an indirectly observed Poisson point process. Unfolding typically proceeds in two steps: one first produces a regularized point estimate of the unknown intensity and then uses the variability of this estimator to form frequentist confidence intervals that quantify the uncertainty of the solution. In this paper, we propose forming the point estimate using empirical Bayes estimation which enables a data-driven choice of the regularization strength through marginal maximum likelihood estimation. Observing that neither Bayesian credible intervals nor standard bootstrap confidence intervals succeed in achieving good frequentist coverage in this problem due to the inherent bias of the regularized point estimate, we introduce an iteratively bias-corrected bootstrap technique for constructing improved confidence intervals. We show using simulations that this enables us to achieve nearly nominal frequentist coverage with only a modest increase in interval length. The proposed methodology is applied to unfolding the

Z

boson invariant mass spectrum as measured in the CMS experiment at the Large Hadron Collider.Comment: Published at http://dx.doi.org/10.1214/15-AOAS857 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org). arXiv admin note: substantial text overlap with arXiv:1401.827

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Tie-respecting bootstrap methods for estimating distributions of sets and functions of eigenvalues

Author: Hall Peter
Lee Young K.
Park Byeong U.
Paul Debashis
Publication venue: 'Bernoulli Society for Mathematical Statistics and Probability'
Publication date: 11/06/2009
Field of study

Bootstrap methods are widely used for distribution estimation, although in some problems they are applicable only with difficulty. A case in point is that of estimating the distributions of eigenvalue estimators, or of functions of those estimators, when one or more of the true eigenvalues are tied. The

m

-out-of-

n

bootstrap can be used to deal with problems of this general type, but it is very sensitive to the choice of

m

. In this paper we propose a new approach, where a tie diagnostic is used to determine the locations of ties, and parameter estimates are adjusted accordingly. Our tie diagnostic is governed by a probability level,

\beta

, which in principle is an analogue of

m

in the

m

-out-of-

n

bootstrap. However, the tie-respecting bootstrap (TRB) is remarkably robust against the choice of

\beta

. This makes the TRB significantly more attractive than the

m

-out-of-

n

bootstrap, where the value of

m

has substantial influence on the final result. The TRB can be used very generally; for example, to test hypotheses about, or construct confidence regions for, the proportion of variability explained by a set of principal components. It is suitable for both finite-dimensional data and functional data.Comment: Published in at http://dx.doi.org/10.3150/08-BEJ154 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

arXiv.org e-Print Archive

Crossref

The RAVE Survey: Constraining the Local Galactic Escape Speed

Author: Bienayme O.
Binney J.
Bland-Hawthorn J.
Dehnen W.
Freeman K. C.
Fulbright J. P.
Gibson B. K.
Gilmore G.
Grebel E. K.
Helmi A.
Munari U.
Navarro J. F.
Parker Q. A.
Ruchti G. R.
Scholz R. -D.
Seabroke G. M.
Siebert A.
Smith M. C.
Steinmetz M.
Watson F. G.
Williams M.
Wyse R. F. G.
Zwitter T.
Publication venue: 'Wiley'
Publication date: 01/01/2006
Field of study

We report new constraints on the local escape speed of our Galaxy. Our analysis is based on a sample of high velocity stars from the RAVE survey and two previously published datasets. We use cosmological simulations of disk galaxy formation to motivate our assumptions on the shape of the velocity distribution, allowing for a significantly more precise measurement of the escape velocity compared to previous studies. We find that the escape velocity lies within the range 498\kms < \ve < 608 \kms (90 per cent confidence), with a median likelihood of 544\kms. The fact that \ve^2 is significantly greater than 2\vc^2 (where \vc=220\kms is the local circular velocity) implies that there must be a significant amount of mass exterior to the Solar circle, i.e. this convincingly demonstrates the presence of a dark halo in the Galaxy. For a simple isothermal halo, one can calculate that the minimum radial extent is

\sim58

kpc. We use our constraints on \ve to determine the mass of the Milky Way halo for three halo profiles. For example, an adiabatically contracted NFW halo model results in a virial mass of

1.42^{+1.14}_{-0.54}\times10^{12}M_\odot

and virial radius of

305^{+66}_{-45}

kpc (90 per cent confidence). For this model the circular velocity at the virial radius is 142^{+31}_{-21}\kms. Although our halo masses are model dependent, we find that they are in good agreement with each other.Comment: 19 pages, 9 figures, MNRAS (accepted). v2 incorporates minor cosmetic revisions which have no effect on the results or conclusion

University of Groningen

The Australian National University

HKU Scholars Hub

arXiv.org e-Print Archive

Crossref

Proceedings - University of Groningen

ARTS repository - University of Groningen

UCL Discovery

Open Research Online (The Open University)

Oxford University Research Archive

CERN Document Server

Macquarie University ResearchOnline

Leicester Research Archive

Dissertations of the University of Groningen

Estimation of Stress-Strength model in the Generalized Linear Failure Rate Distribution

Author: Daneshkhah Alireza
Shahsanaei Fatemeh
Publication venue
Publication date: 02/12/2013
Field of study

In this paper, we study the estimation of

R=P [Y < X ]

, also so-called the stress-strength model, when both

X

and

Y

are two independent random variables with the generalized linear failure rate distributions, under different assumptions about their parameters. We address the maximum likelihood estimator (MLE) of

R

and the associated asymptotic confidence interval. In addition, we compute the MLE and the corresponding Bootstrap confidence interval when the sample sizes are small. The Bayes estimates of

R

and the associated credible intervals are also investigated. An extensive computer simulation is implemented to compare the performances of the proposed estimators. Eventually, we briefly study the estimation of this model when the data obtained from both distributions are progressively type-II censored. We present the MLE and the corresponding confidence interval under three different progressive censoring schemes. We also analysis a set of real data for illustrative purpose.Comment: 31 pages, 2 figures, preprin

arXiv.org e-Print Archive

CiteSeerX