Article thumbnail
Location of Repository

Target SNP selection in complex disease association studies

By Matthias Wjst


BACKGROUND: The massive amount of SNP data stored at public internet sites provides unprecedented access to human genetic variation. Selecting target SNP for disease-gene association studies is currently done more or less randomly as decision rules for the selection of functional relevant SNPs are not available. RESULTS: We implemented a computational pipeline that retrieves the genomic sequence of target genes, collects information about sequence variation and selects functional motifs containing SNPs. Motifs being considered are gene promoter, exon-intron structure, AU-rich mRNA elements, transcription factor binding motifs, cryptic and enhancer splice sites together with expression in target tissue. As a case study, 396 genes on chromosome 6p21 in the extended HLA region were selected that contributed nearly 20,000 SNPs. By computer annotation ~2,500 SNPs in functional motifs could be identified. Most of these SNPs are disrupting transcription factor binding sites but only those introducing new sites had a significant depressing effect on SNP allele frequency. Other decision rules concern position within motifs, the validity of SNP database entries, the unique occurrence in the genome and conserved sequence context in other mammalian genomes. CONCLUSION: Only 10% of all gene-based SNPs have sequence-predicted functional relevance making them a primary target for genotyping in association studies

Topics: Methodology Article
Publisher: BioMed Central
Year: 2004
DOI identifier: 10.1186/1471-2105-5-92
OAI identifier:
Provided by: PubMed Central

Suggested articles


  1. (1998). An internet linkage and mutation database for the complex phenotype asthma. Bioinformatics
  2. (2002). Andrade MA: Association of genes to genetically inherited diseases using data mining. Nat Genet
  3. (2002). Comprehensive analysis of CpG islands in human chromosomes 21 and 22.
  4. Cooper DN: Mutations in Human Genetic Diseases.
  5. (2003). Greene EA: ParseSNP: a tool for the analysis of nucleotide polymorphisms. Nucl Acid Res
  6. (2003). High resolution SNP scan of chromosome 6p21 in pooled samples from patients with complex diseases. Genomics
  7. (2002). HL: SNP databases and pharmacogenetics: great start, but a long way to go. Hum Mutat
  8. (2000). KA: Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons. Science
  9. (2001). Khabar KS: ARED: Human AU-rich element-containing mRNA database reveals an unexpectedly diverse functional repertoire of encoded proteins.
  10. (2002). Kohanene S: SNPper: retrieval and analysis of human SNPs. Bioinformatics
  11. (2002). Krainer AR: Listening to silence and understanding nonsense: exonic mutations that affect splicing. Nature Reviews Genetics
  12. (2003). Lancet D: Human Gene-Centric Databases at the Weizmann Institute of Science: GeneCards,
  13. (1990). Lipman DJ: Basic local alignment search tool.
  14. (2000). MaskerAid: a performance enhancement to RepeatMasker. Bioinformatics
  15. Mutation rates: Evolution.
  16. (2002). PicSNP: A browsable catalog of nonsynonymous single nucleotide polymorphisms in the human genome. Biochem Biophys Res Commun
  17. (1995). Predicting Pol II promoter sequences using transcription factor binding sites.
  18. (2003). Quality and completeness of SNP databases.
  19. (2003). Wingender E: TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.