Article thumbnail

Rare variant collapsing in conjunction with mean log p-value and gradient boosting approaches applied to Genetic Analysis Workshop 17 data

By Yauheniya Cherkas, Nandini Raghavan, Stephan Francke, Frank DeFalco and Marsha A Wilcox


In addition to methods that can identify common variants associated with susceptibility to common diseases, there has been increasing interest in approaches that can identify rare genetic variants. We use the simulated data provided to the participants of Genetic Analysis Workshop 17 (GAW17) to identify both rare and common single-nucleotide polymorphisms and pathways associated with disease status. We apply a rare variant collapsing approach and the usual association tests for common variants to identify candidates for further analysis using pathway-based and tree-based ensemble approaches. We use the mean log p-value approach to identify a top set of pathways and compare it to those used in simulation of GAW17 dataset. We conclude that the mean log p-value approach is able to identify those pathways in the top list and also related pathways. We also use the stochastic gradient boosting approach for the selected subset of single-nucleotide polymorphisms. When compared the result of this tree-based method with the list of single-nucleotide polymorphisms used in dataset simulation, in addition to correct SNPs we observe number of false positives

Topics: Proceedings
Publisher: BioMed Central
OAI identifier:
Provided by: PubMed Central

Suggested articles


  1. (2010). Approaches for evaluating rare polymorphisms in genetic association studies. Hum Hered
  2. (2001). Are rare variants responsible for susceptibility to complex diseases?
  3. (2009). Browning SR: A groupwise association test for rare mutations using a weighted sum statistic.
  4. (2008). Common and rare variants in multifactorial susceptibility to common diseases. Nat Genet
  5. (2002). Cox NJ: The allelic architecture of human disease genes: common disease-common variant … or not? Hum Mol Genet
  6. (2010). E: CCRaVAT and QuTie-enabling analysis of rare variants in large-scale case control and quantitative trait association studies.
  7. (2000). Gene ontology: tool for the unification of biology.
  8. Genetic Analysis Workshop 17 mini-exome simulation. BMC Proc 2011, 5(suppl 9):S1.
  9. (2000). Goto S: KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res
  10. (2001). Greedy function approximation: a gradient boosting machine. Ann Stat
  11. (2009). JH: The elements of statistical learning.
  12. (2008). Novel methods for detecting associations with rare variants for common diseases: application to analysis of sequence data.
  13. (2006). On methods for gene function scoring as a means of facilitating the interpretation of microarray results.
  14. (2007). PC: PLINK: a tool set for wholegenome association and population-based linkage analyses.
  15. (2002). Stochastic gradient boosting. Comput Stat Data Anal
  16. (2002). Systems: TreeNet 1.0 stochastic gradient boosting.
  17. (2007). Thilly WG: A strategy to discover genes that carry multiallelic or mono-allelic risk for common diseases: a cohort allelic sums test (CAST). Mutat Res
  18. (2010). Zeggini E: An evaluation of statistical approaches to rare variant analysis in genetic association studies. Genet Epidemiol

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.