Association testing aims to discover the underlying relationship between
genotypes (usually Single Nucleotide Polymorphisms, or SNPs) and phenotypes
(attributes, or traits). The typically large data sets used in association
testing often contain missing values. Standard statistical methods either
impute the missing values using relatively simple assumptions, or delete them,
or both, which can generate biased results. Here we describe the Bayesian
hierarchical model BAMD (Bayesian Association with Missing Data). BAMD is a
Gibbs sampler, in which missing values are multiply imputed based upon all of
the available information in the data set. We estimate the parameters and prove
that updating one SNP at each iteration preserves the ergodic property of the
Markov chain, and at the same time improves computational speed. We also
implement a model selection option in BAMD, which enables potential detection
of SNP interactions. Simulations show that unbiased estimates of SNP effects
are recovered with missing genotype data. Also, we validate associations
between SNPs and a carbon isotope discrimination phenotype that were previously
reported using a family based method, and discover an additional SNP associated
with the trait. BAMD is available as an R-package from
http://cran.r-project.org/package=BAMDComment: Published in at http://dx.doi.org/10.1214/11-AOAS516 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org