Abstract Background Biologically active sequence motifs often have positional preferences with respect to a genomic landmark. For example, many known transcription factor binding sites (TFBSs) occur within an interval [-300, 0] bases upstream of a transcription start site (TSS). Although some programs for identifying sequence motifs exploit positional information, most of them model it only implicitly and with <it>ad hoc </it>methods, making them unsuitable for general motif searches. Results A-GLAM, a user-friendly computer program for identifying sequence motifs, now incorporates a Bayesian model systematically combining sequence and positional information. A-GLAM's predictions with and without positional information were compared on two human TFBS datasets, each containing sequences corresponding to the interval [-2000, 0] bases upstream of a known TSS. A rigorous statistical analysis showed that positional information significantly improved the prediction of sequence motifs, and an extensive cross-validation study showed that A-GLAM's model was robust against mild misspecification of its parameters. As expected, when sequences in the datasets were successively truncated to the intervals [-1000, 0], [-500, 0] and [-250, 0], positional information aided motif prediction less and less, but never hurt it significantly. Conclusion Although sequence truncation is a viable strategy when searching for biologically active motifs with a positional preference, a probabilistic model (used reasonably) generally provides a superior and more robust strategy, particularly when the sequence motifs' positional preferences are not well characterized.</p

Kim, Nak-Kyeong

Mariño-Ramírez, Leonardo

Spouge, John L

Tharakaraman, Kannan

English

PubMed

Nak-Kyeong Kim

Kannan Tharakaraman

Leonardo Mariño-Ramírez

John L Spouge

Crossref

Finding sequence motifs with Bayesian models incorporating positional information: an application to transcription factor binding sites

Springer - Publisher Connector

Abstract Background Biologically active sequence motifs often have positional preferences with respect to a genomic landmark. For example, many known transcription factor binding sites (TFBSs) occur within an interval [-300, 0] bases upstream of a transcription start site (TSS). Although some programs for identifying sequence motifs exploit positional information, most of them model it only implicitly and with ad hoc methods, making them unsuitable for general motif searches. Results A-GLAM, a user-friendly computer program for identifying sequence motifs, now incorporates a Bayesian model systematically combining sequence and positional information. A-GLAM's predictions with and without positional information were compared on two human TFBS datasets, each containing sequences corresponding to the interval [-2000, 0] bases upstream of a known TSS. A rigorous statistical analysis showed that positional information significantly improved the prediction of sequence motifs, and an extensive cross-validation study showed that A-GLAM's model was robust against mild misspecification of its parameters. As expected, when sequences in the datasets were successively truncated to the intervals [-1000, 0], [-500, 0] and [-250, 0], positional information aided motif prediction less and less, but never hurt it significantly. Conclusion Although sequence truncation is a viable strategy when searching for biologically active motifs with a positional preference, a probabilistic model (used reasonably) generally provides a superior and more robust strategy, particularly when the sequence motifs' positional preferences are not well characterized.</p

Mariño-Ramírez Leonardo

Tharakaraman Kannan

Kim Nak-Kyeong

Spouge John L

Directory of Open Access Journals

BMC Bioinformatics

Assessing computational tools for the discovery of transcription factor binding sites. Nat Biotechnol

C: Clustering of DNA sequences in human promoters. Genome Res

CE: Gibbs Recursive Sampler: finding transcription factor binding sites. Nucleic Acids Res

CE: Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes. Nucleic Acids Res

Elkan C: Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Machine Learning

GM: Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae.

Herzel H: Combining frequency and positional information to predict transcription factor binding sites. Bioinformatics

High-resolution computational models of genome binding events. Nat Biotechnol

JC: Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science

Neuwald AF, Lawrence CE: Bayesian models for multiple local sequence alignment and Gibbs sampling strategies.

Pesole G: Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes An algorithm for finding signals of unknown length in DNA sequences.

Romano LA: The evolution of transcriptional regulation in eukaryotes. Mol Biol Evol

Spouge JL: Alignments anchored on genomic landmarks can aid in the identification of regulatory elements. Bioinformatics

Statistical analysis of over-represented words in human promoter sequences.

Stormo GD: Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics

Tompa M: Analysis of computational approaches for motif discovery. Algorithms Mol Biol

Touzet H: Predicting transcription factor binding sites using local over-representation and comparative genomics.

Wingender E: TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res

Y: A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling. Bioinformatics

YMF: a program for discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic Acids Res

http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2432075

Finding sequence motifs with Bayesian models incorporating positional information: an application to transcription factor binding sites

Abstract

Similar works

Full text

Available Versions

Crossref

Springer - Publisher Connector

Directory of Open Access Journals