In the field of neuroimaging genetics, brain images are used as phenotypes in the search
for genetic variants associated with brain structure or function. This search presents a
formidable statistical challenge, not least because of the very high dimensionality of genotype
and phenotype data produced by modern SNP (single nucleotide polymorphism) arrays
and high resolution MRI. This thesis focuses on the use of multivariate sparse regression
models such as the group lasso and sparse group lasso for the identification of gene
pathways associated with both univariate and multivariate quantitative traits.
The methods described here take particular account of various factors specific to pathways
genome-wide association studies including widespread correlation (linkage disequilibrium)
between genetic predictors, and the fact that many variants overlap multiple pathways.
A resampling strategy that exploits finite sample variability is employed to provide
robust rankings for pathways, SNPs and genes. Comprehensive simulation studies are presented
comparing one proposed method, pathways group lasso with adaptive weights, to a
popular alternative. This method is extended to the case of a multivariate phenotype, and
the resulting pathways sparse reduced-rank regression model and algorithm is applied to a
study identifying gene pathways associated with structural change in the brain characteristic
of Alzheimer’s disease. The original model is also adapted for the task of ’pathways-driven’
SNP and gene selection, and this latter model, pathways sparse group lasso with
adaptive weights, is applied in a search for SNPs and genes associated with elevated lipid
levels in two separate cohorts of Asian adults.
Finally, in a separate section an existing method for the identification of spatially extended clusters of image voxels with heightened activation is evaluated in an imaging genetic
context. This method, known as cluster size inference, rests on a number of assumptions.
Using real imaging and SNP data, false positive rates are found to be poorly controlled
outside of a narrow range of parameters related to image smoothness and activation
thresholds for cluster formation