Skip to main content
Article thumbnail
Location of Repository

Capturing the Spectrum of Interaction Effects in Genetic Association Studies by Simulated Evaporative Cooling Network Analysis

By Brett A. McKinney, James E. Crowe, Jingyu Guo and Dehua Tian


Evidence from human genetic studies of several disorders suggests that interactions between alleles at multiple genes play an important role in influencing phenotypic expression. Analytical methods for identifying Mendelian disease genes are not appropriate when applied to common multigenic diseases, because such methods investigate association with the phenotype only one genetic locus at a time. New strategies are needed that can capture the spectrum of genetic effects, from Mendelian to multifactorial epistasis. Random Forests (RF) and Relief-F are two powerful machine-learning methods that have been studied as filters for genetic case-control data due to their ability to account for the context of alleles at multiple genes when scoring the relevance of individual genetic variants to the phenotype. However, when variants interact strongly, the independence assumption of RF in the tree node-splitting criterion leads to diminished importance scores for relevant variants. Relief-F, on the other hand, was designed to detect strong interactions but is sensitive to large backgrounds of variants that are irrelevant to classification of the phenotype, which is an acute problem in genome-wide association studies. To overcome the weaknesses of these data mining approaches, we develop Evaporative Cooling (EC) feature selection, a flexible machine learning method that can integrate multiple importance scores while removing irrelevant genetic variants. To characterize detailed interactions, we construct a genetic-association interaction network (GAIN), whose edges quantify the synergy between variants with respect to the phenotype. We use simulation analysis to show that EC is able to identify a wide range of interaction effects in genetic association data. We apply the EC filter to a smallpox vaccine cohort study of single nucleotide polymorphisms (SNPs) and infer a GAIN for a collection of SNPs associated with adverse events. Our results suggest an important role for hubs in SNP disease susceptibility networks. The software is available at

Topics: Research Article
Publisher: Public Library of Science
OAI identifier:
Provided by: PubMed Central

Suggested articles


  1. (2002). A comprehensive review of genetic association studies.
  2. (2006). A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility.
  3. (2007). A genomewide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer.
  4. (2002). A perspective on epistasis: limits of models displaying no main effect.
  5. (1961). Adaptive Control Processes.
  6. (1994). Analysis and extensions of Relief;
  7. (2003). Analyzing attribute interactions.
  8. (2007). Common sequence variants in the LOXL1 gene confer susceptibility to exfoliation glaucoma.
  9. (2003). Cytoscape: a software environment for integrated models of biomolecular interaction networks.
  10. (2006). Data simulation software for whole-genome association and other studies in human genetics. Pac Symp Biocomput.
  11. (2007). Detection of gene x gene interactions in genome-wide association studies of human population data.
  12. (1995). Epistasis and its contribution to genetic variance components.
  13. (2004). Epistasis: too often neglected in complex trait studies?
  14. (2002). Epistasis: what it means, what it doesn’t mean, and statistical methods to detect it in humans.
  15. (2007). Evaporative cooling feature selection for genotypic data involving interactions.
  16. (1986). Evaporative cooling of a magnetically trapped and compressed spin-polarized hydrogen gas.
  17. (2001). Genetic analysis of a new mouse model for non-insulin-dependent diabetes.
  18. (2008). Genetic basis for adverse events following smallpox vaccination.
  19. (2004). Genetics, statistics and human disease: analytical retooling for complexity.
  20. (2002). Genome-wide association study and mouse model identify interaction between RET and EDNRB pathways in Hirschsprung disease.
  21. (2001). Genome-wide epistatic interaction analysis reveals complex genetic determinants of circadian behavior in mice.
  22. (2001). Genomewide scans of complex human diseases: true linkage is hard to find.
  23. (2005). Identifying SNPs predictive of phenotype using random forests.
  24. (2004). Improving Random Forests.
  25. (2003). Iterative Relief.
  26. (2006). Machine learning for detecting gene-gene interactions: a review.
  27. (2003). Mathematical multi-locus approaches to localizing complex human trait genes.
  28. (2005). Modular epistasis in yeast metabolism.
  29. (1954). Multivariate information transmission.
  30. (2004). PARF parallel RF algorithm Rudjer Boskovic Institute, Center for informatics and computing, info/parf/.
  31. (2008). Penalized logistic regression for detecting gene interactions.
  32. (2001). Random Forests.
  33. (2007). Risk alleles for multiple sclerosis identified by a genomewide study.
  34. (2004). Screening largescale association study data: exploiting interactions using random forests.
  35. (2006). The challenge for genetic epidemiologists: how to analyze large numbers of SNPs in relation to complex diseases.
  36. (2001). The genetic architecture of quantitative traits.
  37. (2007). Tuning ReliefF for Genome-Wide Genetic Analysis.
  38. (2007). Variants conferring risk of atrial fibrillation on chromosome 4q25.

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.