20 research outputs found
Generalized Scalar-on-Image Regression Models via Total Variation
<p>The use of imaging markers to predict clinical outcomes can have a great impact in public health. The aim of this article is to develop a class of generalized scalar-on-image regression models via total variation (GSIRM-TV), in the sense of generalized linear models, for scalar response and imaging predictor with the presence of scalar covariates. A key novelty of GSIRM-TV is that it is assumed that the slope function (or image) of GSIRM-TV belongs to the space of bounded total variation to explicitly account for the piecewise smooth nature of most imaging data. We develop an efficient penalized total variation optimization to estimate the unknown slope function and other parameters. We also establish nonasymptotic error bounds on the excess risk. These bounds are explicitly specified in terms of sample size, image size, and image smoothness. Our simulations demonstrate a superior performance of GSIRM-TV against many existing approaches. We apply GSIRM-TV to the analysis of hippocampus data obtained from the Alzheimers Disease Neuroimaging Initiative (ADNI) dataset. Supplementary materials for this article are available online.</p
Clustering High-Dimensional Landmark-Based Two-Dimensional Shape Data
<div><p>An important goal in image analysis is to cluster and recognize objects of interest according to the shapes of their boundaries. Clustering such objects faces at least four major challenges including a curved shape space, a high-dimensional feature space, a complex spatial correlation structure, and shape variation associated with some covariates (e.g., age or gender). The aim of this article is to develop a penalized model-based clustering framework to cluster landmark-based planar shape data, while explicitly addressing these challenges. Specifically, a mixture of offset-normal shape factor analyzers (MOSFA) is proposed with mixing proportions defined through a regression model (e.g., logistic) and an offset-normal shape distribution in each component for data in the curved shape space. A latent factor analysis model is introduced to explicitly model the complex spatial correlation. A penalized likelihood approach with both adaptive pairwise fused Lasso penalty function and <i>L</i><sub>2</sub> penalty function is used to automatically realize variable selection via thresholding and deliver a sparse solution. Our real data analysis has confirmed the excellent finite-sample performance of MOSFA in revealing meaningful clusters in the corpus callosum shape data obtained from the Attention Deficit Hyperactivity Disorder-200 (ADHD-200) study. Supplementary materials for this article are available online.</p></div
Cook’s Distance Measures for Varying Coefficient Models With Functional Responses
<div><p>The aim of this article is to develop Cook’s distance measures for assessing the influence of both atypical curves and observations under varying coefficient model with functional responses. Our Cook’s distance measures include Cook’s distances for deleting multiple curves and for deleting multiple grid points, and their scaled Cook’s distances. We systematically investigate some theoretical properties of these diagnostic measures. Simulation studies are conducted to evaluate the finite sample properties of these Cook’s distances under different scenarios. A real diffusion tensor tract dataset is analyzed to illustrate the use of our diagnostic measures.</p></div
A Generic Sure Independence Screening Procedure
<p>Extracting important features from ultra-high dimensional data is one of the primary tasks in statistical learning, information theory, precision medicine, and biological discovery. Many of the sure independent screening methods developed to meet these needs are suitable for special models under some assumptions. With the availability of more data types and possible models, a model-free generic screening procedure with fewer and less restrictive assumptions is desirable. In this article, we propose a generic nonparametric sure independence screening procedure, called BCor-SIS, on the basis of a recently developed universal dependence measure: Ball correlation. We show that the proposed procedure has strong screening consistency even when the dimensionality is an exponential order of the sample size without imposing sub-exponential moment assumptions on the data. We investigate the flexibility of this procedure by considering three commonly encountered challenging settings in biological discovery or precision medicine: iterative BCor-SIS, interaction pursuit, and survival outcomes. We use simulation studies and real data analyses to illustrate the versatility and practicability of our BCor-SIS method. Supplementary materials for this article are available online.</p
MWPCR: Multiscale Weighted Principal Component Regression for High-Dimensional Prediction
<p>We propose a multiscale weighted principal component regression (MWPCR) framework for the use of high-dimensional features with strong spatial features (e.g., smoothness and correlation) to predict an outcome variable, such as disease status. This development is motivated by identifying imaging biomarkers that could potentially aid detection, diagnosis, assessment of prognosis, prediction of response to treatment, and monitoring of disease status, among many others. The MWPCR can be regarded as a novel integration of principal components analysis (PCA), kernel methods, and regression models. In MWPCR, we introduce various weight matrices to prewhitten high-dimensional feature vectors, perform matrix decomposition for both dimension reduction and feature extraction, and build a prediction model by using the extracted features. Examples of such weight matrices include an importance score weight matrix for the selection of individual features at each location and a spatial weight matrix for the incorporation of the spatial pattern of feature vectors. We integrate the importance of score weights with the spatial weights to recover the low-dimensional structure of high-dimensional features. We demonstrate the utility of our methods through extensive simulations and real data analyses of the Alzheimer’s disease neuroimaging initiative (ADNI) dataset. Supplementary materials for this article are available online.</p
SR-HARDI: Spatially Regularizing High Angular Resolution Diffusion Imaging
<p>High angular resolution diffusion imaging (HARDI) has recently been of great interest in mapping the orientation of intravoxel crossing fibers, and such orientation information allows one to infer the connectivity patterns prevalent among different brain regions and possible changes in such connectivity over time for various neurodegenerative and neuropsychiatric diseases. The aim of this article is to propose a penalized multiscale adaptive regression model (PMARM) framework to spatially and adaptively infer the orientation distribution function (ODF) of water diffusion in regions with complex fiber configurations. In PMARM, we reformulate the HARDI imaging reconstruction as a weighted regularized least-square regression (WRLSR) problem. Similarity and distance weights are introduced to account for spatial smoothness of HARDI, while preserving the unknown discontinuities (e.g., edges between white matter and gray matter) of HARDI. The <i>L</i><sub>1</sub> penalty function is introduced to ensure the sparse solutions of ODFs, while a scaled <i>L</i><sub>1</sub> weighted estimator is calculated to correct the bias introduced by the <i>L</i><sub>1</sub> penalty at each voxel. In PMARM, we integrate the multiscale adaptive regression models, the propagation-separation method, and Lasso (least absolute shrinkage and selection operator) to adaptively estimate ODFs across voxels. Experimental results indicate that PMARM can reduce the angle detection errors on fiber crossing area and provide more accurate reconstruction than standard voxel-wise methods. Supplementary materials for this article are available online.</p
Participants' gender and baseline age by study group.
<p>Gender and baseline age distribution by study group. Chi-squared test of independence between gender and study group yields a p-value of 0.02. ANOVA F-test for differences in mean age between study groups yields a p-value of 0.18.</p
Genetic Variation estimates for major regional brain volumes.
<p>Genetic variation estimates, standard errors, and associated likelihood ratio tests for four aggregated volumes.</p
Genetic variation estimates and additional results for non-overlapping brain regions.
<p>Genetic variation estimates (top left; A) and the associated −log<sub>10</sub>p-values from LRT (top right; B). Hotter colors (black</p
Genetic variation estimates and associated clustering results for ROI volumes.
<p>Genetic variation estimates, standard errors, associated LRT p-values, and clustering results for 93 non-overlapping ROIs.</p