research

Bayesian Statistical Methods for Genetic Association Studies with Case-Control and Cohort Design

Abstract

Large-scale genetic association studies are carried out with the hope of discovering single nucleotide polymorphisms involved in the etiology of complex diseases. We propose a coalescent-based model for association mapping which potentially increases the power to detect disease-susceptibility variants in genetic association studies with case-control and cohort design. The approach uses Bayesian partition modelling to cluster haplotypes with similar disease risks by exploiting evolutionary information. We focus on candidate gene regions and we split the chromosomal region of interest into sub-regions or windows of high linkage disequilibrium (LD) therein assuming a perfect phylogeny. The haplotype space is then partitioned into disjoint clusters within which the phenotype-haplotype association is assumed to be the same. The novelty of our approach consists in the fact that the distance used for clustering haplotypes has an evolutionary interpretation, as haplotypes are clustered according to the time to their most recent common mutation. Our approach is fully Bayesian and we develop Markov Chain Monte Carlo algorithms to sample efficiently over the space of possible partitions. We have also developed a Bayesian survival regression model for high-dimension and small sample size settings. We provide a Bayesian variable selection procedure and shrinkage tool by imposing shrinkage priors on the regression coefficients. We have developed a computationally efficient optimization algorithm to explore the posterior surface and find the maximum a posteriori estimates of the regression coefficients. We compare the performance of the proposed methods in simulation studies and using real datasets to both single-marker analyses and recently proposed multi-marker methods and show that our methods perform similarly in localizing the causal allele while yielding lower false positive rates. Moreover, our methods offer computational advantages over other multi-marker approaches

    Similar works