7 research outputs found
Fast, Exact Bootstrap Principal Component Analysis for <i>p</i> > 1 Million
<p>Many have suggested a bootstrap procedure for estimating the sampling variability of principal component analysis (PCA) results. However, when the number of measurements per subject (<i>p</i>) is much larger than the number of subjects (<i>n</i>), calculating and storing the leading principal components (PCs) from each bootstrap sample can be computationally infeasible. To address this, we outline methods for fast, exact calculation of bootstrap PCs, eigenvalues, and scores. Our methods leverage the fact that all bootstrap samples occupy the same <i>n</i>-dimensional subspace as the original sample. As a result, all bootstrap PCs are limited to the same <i>n</i>-dimensional subspace and can be efficiently represented by their low-dimensional coordinates in that subspace. Several uncertainty metrics can be computed solely based on the bootstrap distribution of these low-dimensional coordinates, without calculating or storing the <i>p</i>-dimensional bootstrap components. Fast bootstrap PCA is applied to a dataset of sleep electroencephalogram recordings (<i>p</i> = 900, <i>n</i> = 392), and to a dataset of brain magnetic resonance images (MRIs) (<i>p</i> ≈ 3 million, <i>n</i> = 352). For the MRI dataset, our method allows for standard errors for the first three PCs based on 1000 bootstrap samples to be calculated on a standard laptop in 47 min, as opposed to approximately 4 days with standard methods. Supplementary materials for this article are available online.</p
Model 2 Group Equation Parameter Estimates.
<p>Model 2 Group Equation Parameter Estimates.</p
Multilevel Matrix-Variate Analysis and its Application to Accelerometry-Measured Physical Activity in Clinical Populations
<p>The number of studies where the primary measurement is a matrix is exploding. In response to this, we propose a statistical framework for modeling populations of repeatedly observed matrix-variate measurements. The 2D structure is handled via a matrix-variate distribution with decomposable row/column-specific covariance matrices and a linear mixed effect framework is used to model the multilevel design. The proposed framework flexibly expands to accommodate many common crossed and nested designs and introduces two important concepts: the between-subject distance and intraclass correlation coefficient, both defined for matrix-variate data. The computational feasibility and performance of the approach is shown in extensive simulation studies. The method is motivated by and applied to a study that monitored physical activity of individuals diagnosed with congestive heart failure (CHF) over a 4- to 9-month period. The long-term patterns of physical activity are studied and compared in two CHF subgroups: with and without adverse clinical events. Supplementary materials for this article, that include de-identified accelerometry and clinical data, are available online.</p
Model 2 Group Equation Parameter Estimates.
<p>Model 2 Group Equation Parameter Estimates.</p
The average heart rate (bpm), observed energy expenditure (ml/kg/min), and estimated energy expenditure (ml/kg/min) of four representative age groups across five calibration levels (rest, slow-walking, customary walking, peak sustained walking, and maximal exertion), stratified by gender.
<p>The average heart rate (bpm), observed energy expenditure (ml/kg/min), and estimated energy expenditure (ml/kg/min) of four representative age groups across five calibration levels (rest, slow-walking, customary walking, peak sustained walking, and maximal exertion), stratified by gender.</p
Estimating Energy Expenditure from Heart Rate in Older Adults: A Case for Calibration - Figure 1
<p>a shows a spaghetti plot (N = 290) of the relationship between heart rate (bpm) and energy expenditure (ml/kg/min). b shows an age and sex stratified LOWESS of the relationship between heart rate (bpm) and energy expenditure (ml/kg/min).</p