thesis

Statistical Methods in Neuroimaging Genetics: Pathways Sparse Regression and Cluster Size Inference

Abstract

In the field of neuroimaging genetics, brain images are used as phenotypes in the search for genetic variants associated with brain structure or function. This search presents a formidable statistical challenge, not least because of the very high dimensionality of genotype and phenotype data produced by modern SNP (single nucleotide polymorphism) arrays and high resolution MRI. This thesis focuses on the use of multivariate sparse regression models such as the group lasso and sparse group lasso for the identification of gene pathways associated with both univariate and multivariate quantitative traits. The methods described here take particular account of various factors specific to pathways genome-wide association studies including widespread correlation (linkage disequilibrium) between genetic predictors, and the fact that many variants overlap multiple pathways. A resampling strategy that exploits finite sample variability is employed to provide robust rankings for pathways, SNPs and genes. Comprehensive simulation studies are presented comparing one proposed method, pathways group lasso with adaptive weights, to a popular alternative. This method is extended to the case of a multivariate phenotype, and the resulting pathways sparse reduced-rank regression model and algorithm is applied to a study identifying gene pathways associated with structural change in the brain characteristic of Alzheimer’s disease. The original model is also adapted for the task of ’pathways-driven’ SNP and gene selection, and this latter model, pathways sparse group lasso with adaptive weights, is applied in a search for SNPs and genes associated with elevated lipid levels in two separate cohorts of Asian adults. Finally, in a separate section an existing method for the identification of spatially extended clusters of image voxels with heightened activation is evaluated in an imaging genetic context. This method, known as cluster size inference, rests on a number of assumptions. Using real imaging and SNP data, false positive rates are found to be poorly controlled outside of a narrow range of parameters related to image smoothness and activation thresholds for cluster formation

    Similar works