Search CORE

2 research outputs found

Finding sequence motifs with Bayesian models incorporating positional information: an application to transcription factor binding sites

Author: Kim Nak-Kyeong
Mariño-Ramírez Leonardo
Spouge John L
Tharakaraman Kannan
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Biologically active sequence motifs often have positional preferences with respect to a genomic landmark. For example, many known transcription factor binding sites (TFBSs) occur within an interval [-300, 0] bases upstream of a transcription start site (TSS). Although some programs for identifying sequence motifs exploit positional information, most of them model it only implicitly and with <it>ad hoc </it>methods, making them unsuitable for general motif searches. Results A-GLAM, a user-friendly computer program for identifying sequence motifs, now incorporates a Bayesian model systematically combining sequence and positional information. A-GLAM's predictions with and without positional information were compared on two human TFBS datasets, each containing sequences corresponding to the interval [-2000, 0] bases upstream of a known TSS. A rigorous statistical analysis showed that positional information significantly improved the prediction of sequence motifs, and an extensive cross-validation study showed that A-GLAM's model was robust against mild misspecification of its parameters. As expected, when sequences in the datasets were successively truncated to the intervals [-1000, 0], [-500, 0] and [-250, 0], positional information aided motif prediction less and less, but never hurt it significantly. Conclusion Although sequence truncation is a viable strategy when searching for biologically active motifs with a positional preference, a probabilistic model (used reasonably) generally provides a superior and more robust strategy, particularly when the sequence motifs' positional preferences are not well characterized.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Discovering Sequence Motifs with Arbitrary Insertions and Deletions

Author: A Bahr
A Bairoch
A Hansson
A Reményi
A Sandelin
AF Neuwald
AF Neuwald
AF Neuwald
B Kobe
Bostjan Kobe
C Grasso
CB Do
CC Yap
CE Lawrence
D Caffrey
E de Castro
F Diella
FP Roth
G Pavesi
Gary Stormo
I Jonassen
IA Wadman
J van Helden
J van Helden
J Zhu
JG Henikoff
JJ Welch
JS Liu
JS Mattick
K Karplus
K Karplus
K Shida
K Sjölander
L Vitelli
M Ashburner
Martin C. Frith
MC Frith
MS Waterman
N Hulo
Neil F. W. Saunders
NK Kim
P Puntervoll
P Vyas
R Amanchy
R Durbin
R Hughey
R Lahlil
RC Edgar
RM Böhmer
S Sinha
SA Johnson
SR Eddy
T Beissbarth
T Lassmann
T Yada
TD Schneider
Timothy L. Bailey
TK Attwood
TL Bailey
TL Bailey
V Deleuze
V Matys
WW Wasserman
X Liu
XS Liu
Y Makita
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

Biology is encoded in molecular sequences: deciphering this encoding remains a grand scientific challenge. Functional regions of DNA, RNA, and protein sequences often exhibit characteristic but subtle motifs; thus, computational discovery of motifs in sequences is a fundamental and much-studied problem. However, most current algorithms do not allow for insertions or deletions (indels) within motifs, and the few that do have other limitations. We present a method, GLAM2 (Gapped Local Alignment of Motifs), for discovering motifs allowing indels in a fully general manner, and a companion method GLAM2SCAN for searching sequence databases using such motifs. glam2 is a generalization of the gapless Gibbs sampling algorithm. It re-discovers variable-width protein motifs from the PROSITE database significantly more accurately than the alternative methods PRATT and SAM-T2K. Furthermore, it usefully refines protein motifs from the ELM database: in some cases, the refined motifs make orders of magnitude fewer overpredictions than the original ELM regular expressions. GLAM2 performs respectably on the BAliBASE multiple alignment benchmark, and may be superior to leading multiple alignment methods for “motif-like” alignments with N- and C-terminal extensions. Finally, we demonstrate the use of GLAM2 to discover protein kinase substrate motifs and a gapped DNA motif for the LIM-only transcriptional regulatory complex: using GLAM2SCAN, we identify promising targets for the latter. GLAM2 is especially promising for short protein motifs, and it should improve our ability to identify the protein cleavage sites, interaction sites, post-translational modification attachment sites, etc., that underlie much of biology. It may be equally useful for arbitrarily gapped motifs in DNA and RNA, although fewer examples of such motifs are known at present. GLAM2 is public domain software, available for download at http://bioinformatics.org.au/glam2

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

University of Queensland eSpace