Since most analysis software for genome-wide association studies (GWAS)
currently exploit only unrelated individuals, there is a need for efficient
applications that can handle general pedigree data or mixtures of both
population and pedigree data. Even data sets thought to consist of only
unrelated individuals may include cryptic relationships that can lead to false
positives if not discovered and controlled for. In addition, family designs
possess compelling advantages. They are better equipped to detect rare
variants, control for population stratification, and facilitate the study of
parent-of-origin effects. Pedigrees selected for extreme trait values often
segregate a single gene with strong effect. Finally, many pedigrees are
available as an important legacy from the era of linkage analysis.
Unfortunately, pedigree likelihoods are notoriously hard to compute. In this
paper we re-examine the computational bottlenecks and implement ultra-fast
pedigree-based GWAS analysis. Kinship coefficients can either be based on
explicitly provided pedigrees or automatically estimated from dense markers.
Our strategy (a) works for random sample data, pedigree data, or a mix of both;
(b) entails no loss of power; (c) allows for any number of covariate
adjustments, including correction for population stratification; (d) allows for
testing SNPs under additive, dominant, and recessive models; and (e)
accommodates both univariate and multivariate quantitative traits. On a typical
personal computer (6 CPU cores at 2.67 GHz), analyzing a univariate HDL
(high-density lipoprotein) trait from the San Antonio Family Heart Study
(935,392 SNPs on 1357 individuals in 124 pedigrees) takes less than 2 minutes
and 1.5 GB of memory. Complete multivariate QTL analysis of the three
time-points of the longitudinal HDL multivariate trait takes less than 5
minutes and 1.5 GB of memory