Search CORE

82 research outputs found

Fast Covariance Estimation for High-dimensional Functional Data

Author: Crainiceanu Ciprian
Ruppert David
Xiao Luo
Zipunnikov Vadim
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/06/2013
Field of study

For smoothing covariance functions, we propose two fast algorithms that scale linearly with the number of observations per function. Most available methods and software cannot smooth covariance matrices of dimension

J \times J

with

J>500

; the recently introduced sandwich smoother is an exception, but it is not adapted to smooth covariance matrices of large dimensions such as

J \ge 10,000

. Covariance matrices of order

J=10,000

, and even

J=100,000

, are becoming increasingly common, e.g., in 2- and 3-dimensional medical imaging and high-density wearable sensor data. We introduce two new algorithms that can handle very large covariance matrices: 1) FACE: a fast implementation of the sandwich smoother and 2) SVDS: a two-step procedure that first applies singular value decomposition to the data matrix and then smoothes the eigenvectors. Compared to existing techniques, these new algorithms are at least an order of magnitude faster in high dimensions and drastically reduce memory requirements. The new algorithms provide instantaneous (few seconds) smoothing for matrices of dimension

J=10,000

and very fast (

<

10 minutes) smoothing for

J=100,000

. Although SVDS is simpler than FACE, we provide ready to use, scalable R software for FACE. When incorporated into R package {\it refund}, FACE improves the speed of penalized functional regression by an order of magnitude, even for data of normal size (

J <500

). We recommend that FACE be used in practice for the analysis of noisy and high-dimensional functional data.Comment: 35 pages, 4 figure

arXiv.org e-Print Archive

Crossref

Collection Of Biostatistics Research Archive

Fast, Exact Bootstrap Principal Component Analysis for p>1 million

Author: Caffo Brian
Fisher Aaron
Schwartz Brian
Zipunnikov Vadim
Publication venue
Publication date: 14/05/2014
Field of study

Many have suggested a bootstrap procedure for estimating the sampling variability of principal component analysis (PCA) results. However, when the number of measurements per subject (

p

) is much larger than the number of subjects (

n

), the challenge of calculating and storing the leading principal components from each bootstrap sample can be computationally infeasible. To address this, we outline methods for fast, exact calculation of bootstrap principal components, eigenvalues, and scores. Our methods leverage the fact that all bootstrap samples occupy the same

n

-dimensional subspace as the original sample. As a result, all bootstrap principal components are limited to the same

n

-dimensional subspace and can be efficiently represented by their low dimensional coordinates in that subspace. Several uncertainty metrics can be computed solely based on the bootstrap distribution of these low dimensional coordinates, without calculating or storing the

p

-dimensional bootstrap components. Fast bootstrap PCA is applied to a dataset of sleep electroencephalogram (EEG) recordings (

p=900

n=392

), and to a dataset of brain magnetic resonance images (MRIs) (

p\approx

3 million,

n=352

). For the brain MRI dataset, our method allows for standard errors for the first 3 principal components based on 1000 bootstrap samples to be calculated on a standard laptop in 47 minutes, as opposed to approximately 4 days with standard methods.Comment: 25 pages, including 9 figures and link to R package. 2014-05-14 update: final formatting edits for journal submission, condensed figure

arXiv.org e-Print Archive

CiteSeerX

Covariance Estimation and Principal Component Analysis for Mixed-Type Functional Data with application to mHealth in Mood Disorders

Author: Dey Debangan
Ghosal Rahul
Merikangas Kathleen
Zipunnikov Vadim
Publication venue
Publication date: 28/06/2023
Field of study

Mobile digital health (mHealth) studies often collect multiple within-day self-reported assessments of participants' behaviour and health. Indexed by time of day, these assessments can be treated as functional observations of continuous, truncated, ordinal, and binary type. We develop covariance estimation and principal component analysis for mixed-type functional data like that. We propose a semiparametric Gaussian copula model that assumes a generalized latent non-paranormal process generating observed mixed-type functional data and defining temporal dependence via a latent covariance. The smooth estimate of latent covariance is constructed via Kendall's Tau bridging method that incorporates smoothness within the bridging step. The approach is then extended with methods for handling both dense and sparse sampling designs, calculating subject-specific latent representations of observed data, latent principal components and principal component scores. Importantly, the proposed framework handles all four mixed types in a unified way. Simulation studies show a competitive performance of the proposed method under both dense and sparse sampling designs. The method is applied to data from 497 participants of National Institute of Mental Health Family Study of the Mood Disorder Spectrum to characterize the differences in within-day temporal patterns of mood in individuals with the major mood disorder subtypes including Major Depressive Disorder, and Type 1 and 2 Bipolar Disorder

arXiv.org e-Print Archive

Closed form GLM cumulants and GLMM fitting with a SQUAR-EM-LA 2 algorithm

Author: James G Booth
Vadim Zipunnikov
Publication venue
Publication date: 11/04/2020
Field of study

Abstract We find closed form expressions for the standardized cumulants of generalized linear models. This reduces the complexity of their calculation from O(p 6 ) to O(p 2 ) operations which allows efficient construction of second-order saddlepoint approximations to the pdf of sufficient statistics. We adapt the result to obtain a closed form expression for the second-order Laplace approximation for a GLMM likelihood. Using this approximation, we develop a computationally highly efficient accelerated EM procedure, SQUAR-EM-LA 2 . The procedure is illustrated by fitting a GLMM to a well-known data set. Extensive simulations show the phenomenal performance of the approach. Matlab software is provided for implementing the proposed algorithm

CiteSeerX

Structured Functional Principal Component Analysis

Author: Crainiceanu Ciprian M.
Greven Sonja
Shou Haochang
Zipunnikov Vadim
Publication venue
Publication date: 24/04/2013
Field of study

Motivated by modern observational studies, we introduce a class of functional models that expands nested and crossed designs. These models account for the natural inheritance of correlation structure from sampling design in studies where the fundamental sampling unit is a function or image. Inference is based on functional quadratics and their relationship with the underlying covariance structure of the latent processes. A computationally fast and scalable estimation procedure is developed for ultra-high dimensional data. Methods are illustrated in three examples: high-frequency accelerometer data for daily activity, pitch linguistic data for phonetic analysis, and EEG data for studying electrical brain activity during sleep

arXiv.org e-Print Archive

CiteSeerX

Collection Of Biostatistics Research Archive

POPULATION VALUE DECOMPOSITION, A FRAMEWORK FOR THE ANALYSIS OF IMAGE POPULATIONS

Author: Caffo Brian S.
Crainiceanu Ciprian M.
Luo Sheng
Zipunnikov Vadim
Publication venue: Collection of Biostatistics Research Archive
Publication date: 07/10/2010
Field of study

Images, often stored in multidimensional arrays are fast becoming ubiquitous in medical and public health research. Analyzing populations of images is a statistical problem that raises a host of daunting challenges. The most severe challenge is that data sets incorporating images recorded for hundreds or thousands of subjects at multiple visits are massive. We introduce the population value decomposition (PVD), a general method for simultaneous dimensionality reduction of large populations of massive images. We show how PVD can seamlessly be incorporated into statistical modeling and lead to a new, transparent and fast inferential framework. Our methodology was motivated by and applied to the Sleep Heart Health Study, the largest community-based cohort study of sleep containing more than 85 billion observations on thousands of subjects at two visits

Collection Of Biostatistics Research Archive

LONGITUDINAL HIGH-DIMENSIONAL DATA ANALYSIS

Author: Caffo Brian
Crainiceanu Ciprian
Greven Sonja
Reich Daniel S.
Zipunnikov Vadim
Publication venue: Collection of Biostatistics Research Archive
Publication date: 01/01/2011
Field of study

We develop a flexible framework for modeling high-dimensional functional and imaging data observed longitudinally. The approach decomposes the observed variability of high-dimensional observations measured at multiple visits into three additive components: a subject-specific functional random intercept that quantifies the cross-sectional variability, a subject-specific functional slope that quantifies the dynamic irreversible deformation over multiple visits, and a subject-visit specific functional deviation that quantifies exchangeable or reversible visit-to-visit changes. The proposed method is very fast, scalable to studies including ultra-high dimensional data, and can easily be adapted to and executed on modest computing infrastructures. The method is applied to the longitudinal analysis of diffusion tensor imaging (DTI) data of the corpus callosum of multiple sclerosis (MS) subjects. The study includes 176 subjects observed at 466 visits. For each subject and visit the study contains a registered DTI scan of the corpus callosum at roughly 30,000 voxels

CiteSeerX

Collection Of Biostatistics Research Archive