Article thumbnail

A Latent Model for Prioritization of SNPs for Functional Studies

By Brooke L. Fridley, Ed Iversen, Ya-Yu Tsai, Gregory D. Jenkins, Ellen L. Goode and Thomas A. Sellers


One difficult question facing researchers is how to prioritize SNPs detected from genetic association studies for functional studies. Often a list of the top M SNPs is determined based on solely the p-value from an association analysis, where M is determined by financial/time constraints. For many studies of complex diseases, multiple analyses have been completed and integrating these multiple sets of results may be difficult. One may also wish to incorporate biological knowledge, such as whether the SNP is in the exon of a gene or a regulatory region, into the selection of markers to follow-up. In this manuscript, we propose a Bayesian latent variable model (BLVM) for incorporating “features” about a SNP to estimate a latent “quality score”, with SNPs prioritized based on the posterior probability distribution of the rankings of these quality scores. We illustrate the method using data from an ovarian cancer genome-wide association study (GWAS). In addition to the application of the BLVM to the ovarian GWAS, we applied the BLVM to simulated data which mimics the setting involving the prioritization of markers across multiple GWAS for related diseases/traits. The top ranked SNP by BLVM for the ovarian GWAS, ranked 2nd and 7th based on p-values from analyses of all invasive and invasive serous cases. The top SNP based on serous case analysis p-value (which ranked 197th for invasive case analysis), was ranked 8th based on the posterior probability of being in the top 5 markers (0.13). In summary, the application of the BLVM allows for the systematic integration of multiple SNP “features” for the prioritization of loci for fine-mapping or functional studies, taking into account the uncertainty in ranking

Topics: Research Article
Publisher: Public Library of Science
OAI identifier:
Provided by: PubMed Central

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.

Suggested articles


  1. (2005). A Bayesian latent variable mdoel for institutional ranking.
  2. (2007). A new multipoint method for genome-wide association studies by imputation of genotypes.[see comment].
  3. (2000). Agresti A
  4. (1984). An introduction to latent variables models.
  5. (1993). Bayesian analysis of binary and polychotomous response data.
  6. (1995). Bayesian Data Analysis.
  7. (2005). Bayesian models for population-based case-control studies when the population is in Hardy-Weinberg equilibrium.
  8. (2007). Bayesian Statistical Modelling. West Sussex:
  9. (2004). BRugs User Manual,
  10. (2009). Cancer statistics,
  11. (1989). Empirical Bayes Ranking Methods.
  12. (2007). Enriching the analysis of genomewide association studies with hierarchical modeling.
  13. (2001). Generalized, Linear, and Mixed Models.
  14. (1997). Genetic analysis with hierarchical models.
  15. (2009). Genotype imputation.
  16. (2007). Hierarchical Bayes prioritization of marker associations from a genome-wide association scan for further investigation.
  17. (2003). Hierarchical modeling of linkage disequilibrium: genetic structure and spatial relations.
  18. (2007). Imputation-based analysis of association studies: candidate regions and quantitative traits.
  19. (2010). Integrating pathway analysis and genetics of gene expression for genome-wide association studies.
  20. (1996). League tables and their limitations: statistical issues in comparison of institutional performance.
  21. (2006). ltm: An R package for latent variable modeling and item response theory analyses.
  22. (1997). Maximum likelihood algorithms for generalized linear mixed models.
  23. (1999). Multilevel Analysis.
  24. (2007). PLINK: a tool set for whole-genome association and population-based linkage analyses.
  25. (2009). Potential etiologic and functional implications of genome-wide association loci for human diseases and traits.
  26. (1960). Probabilistic Models for some Intelligence and Attainment Tests. Copenhagen: Paedagogike Institut.
  27. (2005). Ranks of genuine associations in wholegenome scans.
  28. (2010). SCAN: SNP and copy number annotation.
  29. (2007). Structural Equation Modeling: A Bayesian Approach.
  30. (2009). Validating, augmenting and refining genome-wide association signals.