3 research outputs found
High-dimensional covariance regression with application to co-expression QTL detection
While covariance matrices have been widely studied in many scientific fields,
relatively limited progress has been made on estimating conditional covariances
that permits a large covariance matrix to vary with high-dimensional
subject-level covariates. In this paper, we present a new sparse multivariate
regression framework that models the covariance matrix as a function of
subject-level covariates. In the context of co-expression quantitative trait
locus (QTL) studies, our method can be used to determine if and how gene
co-expressions vary with genetic variations. To accommodate high-dimensional
responses and covariates, we stipulate a combined sparsity structure that
encourages covariates with non-zero effects and edges that are modulated by
these covariates to be simultaneously sparse. We approach parameter estimation
with a blockwise coordinate descent algorithm, and investigate the
convergence rate of the estimated parameters. In addition, we propose a
computationally efficient debiased inference procedure for uncertainty
quantification. The efficacy of the proposed method is demonstrated through
numerical experiments and an application to a gene co-expression network study
with brain cancer patients
Sparsity in Varying-coefficient Regression and Covariance Matrix Estimation
This dissertation discusses how we can exploit sparsity, a statistical assumption that only a small number of relationships between variables are non-zero, in the model selection for regression and covariance matrix estimation.
In a linear model, the effects from the predictors to the response may vary for each individual. In this case, the purpose of model selection is not only to identify significant predictors but also to understand how their effects on the response differ by individuals. This can be cast as a model selection problem for a varying-coefficient regression. However, this is challenging when there is a pre-specified group structure among variables. We propose a novel variable selection method for a varying-coefficient regression with such structured variables. Our method is empirically shown to select relevant variables consistently. Also, our method screens irrelevant variables better than existing methods. Hence, our method leads to a model with higher sensitivity, lower false discovery rate and higher prediction accuracy than the existing methods. We apply this method to the Huntington disease study and find that the effects from the brain regions to motor impairment differ by disease severity of the patients, indicating the need for customized intervention.
In covariance matrix estimation, current approaches to introduce sparsity do not guarantee positive definiteness or asymptotic efficiency. For multivariate normal distributions, we construct a positive definite and asymptotically efficient estimator when the location of the zero entries is known. If the location of the zero entries is unknown, we further construct a positive definite thresholding estimator by combining iterative conditional fitting with thresholding. We prove our thresholding estimator is asymptotically efficient with probability tending to one. In simulation studies, we show our estimator more closely matches the true covariance and more correctly identifies the non-zero entries than competing estimators. We apply our estimator to Huntington disease and detect non-zero correlations among brain regional volumes. Such correlations are timely for ongoing treatment studies to inform how different brain regions are likely to be affected by these treatments