thesis

Statistical Methods and Models for Modern Genetic Analysis.

Abstract

The Genome-Wide Association Study (GWAS) is the predominant tool to search for genetic risk variants that contribute to complex human disease. Despite the large number of GWAS findings, variants implicated by GWAS are themselves unlikely to fully explain the heritability of many diseases. In this dissertation, we propose statistical methods to augment GWAS and further our understanding of the genetic causes of complex disease. In the first project, we consider the challenges of a gene-environment analysis performed as a follow-up to a significant initial GWAS result. It is known that effect estimates based on the same data that showed the significant GWAS result suffer from an upward bias called the “Winner's Curse." We show that the initial GWAS testing strategy can induce bias in both follow-up hypothesis testing and estimation for gene-environment interaction. We propose a novel bias-correction method based on a partial likelihood Markov Chain Monte Carlo algorithm. In the second project, we shift attention to rare genetic variants that have low power of being detected by GWAS. We propose the Cumulative Minor Allele Test (CMAT) to pool together multiple rare variants from the same gene and test for an excessive burden of rare variants in either cases or controls. We show the CMAT performs favorably across a range of study designs. Notably, the CMAT accommodates probabilistic genotypes, extending applicability to low-coverage and imputed sequence data. We use a simulation analysis to validate study designs that combine sequenced and imputed samples as a means to improve power to detect rare risk variants. Determining conditions that optimize imputation accuracy is important for successful application. In the final project, we propose a coalescent model of genotype imputation that allows fast, analytical estimates of imputation accuracy across complex population genetic models. We use our model to compare the performance of custom-made reference panels drawn from the same source population as imputation targets to publicly available reference panels (i.e. 1000 Genomes Project) that may differ in ancestry from the targets.Ph.D.BiostatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/89761/1/mattz_1.pd

    Similar works