3 research outputs found

    High-dimensional covariance regression with application to co-expression QTL detection

    Full text link
    While covariance matrices have been widely studied in many scientific fields, relatively limited progress has been made on estimating conditional covariances that permits a large covariance matrix to vary with high-dimensional subject-level covariates. In this paper, we present a new sparse multivariate regression framework that models the covariance matrix as a function of subject-level covariates. In the context of co-expression quantitative trait locus (QTL) studies, our method can be used to determine if and how gene co-expressions vary with genetic variations. To accommodate high-dimensional responses and covariates, we stipulate a combined sparsity structure that encourages covariates with non-zero effects and edges that are modulated by these covariates to be simultaneously sparse. We approach parameter estimation with a blockwise coordinate descent algorithm, and investigate the β„“2\ell_2 convergence rate of the estimated parameters. In addition, we propose a computationally efficient debiased inference procedure for uncertainty quantification. The efficacy of the proposed method is demonstrated through numerical experiments and an application to a gene co-expression network study with brain cancer patients

    Sparsity in Varying-coefficient Regression and Covariance Matrix Estimation

    Get PDF
    This dissertation discusses how we can exploit sparsity, a statistical assumption that only a small number of relationships between variables are non-zero, in the model selection for regression and covariance matrix estimation. In a linear model, the effects from the predictors to the response may vary for each individual. In this case, the purpose of model selection is not only to identify significant predictors but also to understand how their effects on the response differ by individuals. This can be cast as a model selection problem for a varying-coefficient regression. However, this is challenging when there is a pre-specified group structure among variables. We propose a novel variable selection method for a varying-coefficient regression with such structured variables. Our method is empirically shown to select relevant variables consistently. Also, our method screens irrelevant variables better than existing methods. Hence, our method leads to a model with higher sensitivity, lower false discovery rate and higher prediction accuracy than the existing methods. We apply this method to the Huntington disease study and find that the effects from the brain regions to motor impairment differ by disease severity of the patients, indicating the need for customized intervention. In covariance matrix estimation, current approaches to introduce sparsity do not guarantee positive definiteness or asymptotic efficiency. For multivariate normal distributions, we construct a positive definite and asymptotically efficient estimator when the location of the zero entries is known. If the location of the zero entries is unknown, we further construct a positive definite thresholding estimator by combining iterative conditional fitting with thresholding. We prove our thresholding estimator is asymptotically efficient with probability tending to one. In simulation studies, we show our estimator more closely matches the true covariance and more correctly identifies the non-zero entries than competing estimators. We apply our estimator to Huntington disease and detect non-zero correlations among brain regional volumes. Such correlations are timely for ongoing treatment studies to inform how different brain regions are likely to be affected by these treatments
    corecore