Technological advances in genotyping have given rise to hypothesis-based
association studies of increasing scope. As a result, the scientific hypotheses
addressed by these studies have become more complex and more difficult to
address using existing analytic methodologies. Obstacles to analysis include
inference in the face of multiple comparisons, complications arising from
correlations among the SNPs (single nucleotide polymorphisms), choice of their
genetic parametrization and missing data. In this paper we present an efficient
Bayesian model search strategy that searches over the space of genetic markers
and their genetic parametrization. The resulting method for Multilevel
Inference of SNP Associations, MISA, allows computation of multilevel posterior
probabilities and Bayes factors at the global, gene and SNP level, with the
prior distribution on SNP inclusion in the model providing an intrinsic
multiplicity correction. We use simulated data sets to characterize MISA's
statistical power, and show that MISA has higher power to detect association
than standard procedures. Using data from the North Carolina Ovarian Cancer
Study (NCOCS), MISA identifies variants that were not identified by standard
methods and have been externally ``validated'' in independent studies. We
examine sensitivity of the NCOCS results to prior choice and method for
imputing missing data. MISA is available in an R package on CRAN.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS322 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org