Search CORE

9 research outputs found

Promoter prediction in E. coli based on SIDD profiles and Artificial Neural Networks

Author: A Kanhere
Abigail S Newsome
Aleksandra A Markovets
AM Huerta
Charles Bland
GZ Hertz
H Wang
H Wang
L Kozobay-Avraham
V Rangannan
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background One of the major challenges in biology is the correct identification of promoter regions. Computational methods based on motif searching have been the traditional approach taken. Recent studies have shown that DNA structural properties, such as curvature, stacking energy, and stress-induced duplex destabilization (SIDD) are useful in promoter prediction, as well. In this paper, the currently used SIDD energy threshold method is compared to the proposed artificial neural network (ANN) approach for finding promoters based on SIDD profile data. Results When compared to the SIDD threshold prediction method, artificial neural networks showed noticeable improvements for precision, recall, and <it>F</it>-score over a range of values. The maximal <it>F</it>-score for the ANN classifier was 62.3 and 56.8 for the threshold-based classifier. Conclusions Artificial neural networks were used to predict promoters based on SIDD profile data. Results using this technique were an improvement over the previous SIDD threshold approach. Over a wide range of precision-recall values, artificial neural networks were more capable of identifying distinctive characteristics of promoter regions than threshold based methods.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

PerPlot & PerScan: tools for analysis of DNA curvature-related periodicity in genomic nucleotide sequences

Author: A Bolshoy
A Fire
A Theologis
C Jacq
CJ Bult
E Segal
EN Trifonov
H Herzel
H Herzel
H Willenbrock
J Mrázek
J Mrázek
J Mrázek
L Kozobay-Avraham
LE Ulanovsky
MY Tolstorukov
P Schieg
P Worning
R Kiyama
R Rohs
RD Fleischmann
SG Gu
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Promoter prediction and annotation of microbial genomes based on DNA sequence and structural responses to superhelical stress

Author: A Kanhere
AG Pedersen
AL Delcher
AM Huerta
CB Harley
CE Shannon
CJ Benham
CJ Benham
Craig J Benham
DK Hawley
ES Shpigelman
GZ Hertz
H Salgado
H Wang
H Wang
H Wang
Huiquan Wang
J SantaLucia Jr
JD Helmann
L Kozobay-Avraham
M Rosenberg
MG Reese
ML Opel
R Durbin
RA Johnson
RR Sokal
S Lisser
SD Sheridan
U Ohler
WK Olson
WS Hayes
Y Makita
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: In our previous studies, we found that the sites in prokaryotic genomes which are most susceptible to duplex destabilization under the negative superhelical stresses that occur in vivo are statistically highly significantly associated with intergenic regions that are known or inferred to contain promoters. In this report we investigate how this structural property, either alone or together with other structural and sequence attributes, may be used to search prokaryotic genomes for promoters. RESULTS: We show that the propensity for stress-induced DNA duplex destabilization (SIDD) is closely associated with specific promoter regions. The extent of destabilization in promoter-containing regions is found to be bimodally distributed. When compared with DNA curvature, deformability, thermostability or sequence motif scores within the -10 region, SIDD is found to be the most informative DNA property regarding promoter locations in the E. coli K12 genome. SIDD properties alone perform better at detecting promoter regions than other programs trained on this genome. Because this approach has a very low false positive rate, it can be used to predict with high confidence the subset of promoters that are strongly destabilized. When SIDD properties are combined with -10 motif scores in a linear classification function, they predict promoter regions with better than 80% accuracy. When these methods were tested with promoter and non-promoter sequences from Bacillus subtilis, they achieved similar or higher accuracies. We also present a strictly SIDD-based predictor for annotating promoter sequences in complete microbial genomes. CONCLUSION: In this report we show that the propensity to undergo stress-induced duplex destabilization (SIDD) is a distinctive structural attribute of many prokaryotic promoter sequences. We have developed methods to identify promoter sequences in prokaryotic genomes that use SIDD either as a sole predictor or in combination with other DNA structural and sequence properties. Although these methods cannot predict all the promoter-containing regions in a genome, they do find large sets of potential regions that have high probabilities of being true positives. This approach could be especially valuable for annotating those genomes about which there is limited experimental data

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A reexamination of information theory-based methods for DNA-binding site identification

Author: A Kolb
AR Fernandez De Henestrosa
B Barash
CE Lawrence
CE Shannon
D Betel
D GuhaThakurta
DT Pride
EN Trifonov
ET Jaynes
ET Jaynes
G Robertson
G Thijs
GD Stormo
GD Stormo
GD Stormo
GE Crooks
GJ Phillips
GZ Hertz
I Erill
Ivan Erill
J Rudnick
J van Helden
JJ Kohler
JM Heumann
JT Kim
JW Gibbs
K Gaston
K Uchida
KL Griffith
L Kozobay-Avraham
LJ Sun
LL Gatlin
LL Gatlin
M Abella
M Asayama
M Butala
M Schnarr
MC O'Neill
MC O'Neill
MC O'Neill
MH Zweig
Michael C O'Neill
ML Bulyk
MS Gelfand
N Baichoo
O Aparicio
O Huisman
OG Berg
OG Berg
P D'Haeseleer
PH von Hippel
PH von Hippel
R Brent
R Jauregui
R Munch
R Munch
R Osada
R Staden
RJ Redfield
RK Shultzaberger
RK Shultzaberger
RK Shultzaberger
RV Parbhane
S Krishna
S Kullback
ST Cole
TD Schneider
TD Schneider
TD Schneider
TD Schneider
TD Schneider
TL Bailey
TL Bailey
X Liu
Z Chen
Z Xiaoyue
Publication venue: BioMed Central
Publication date: 01/02/2009
Field of study

Abstract Background Searching for transcription factor binding sites in genome sequences is still an open problem in bioinformatics. Despite substantial progress, search methods based on information theory remain a standard in the field, even though the full validity of their underlying assumptions has only been tested in artificial settings. Here we use newly available data on transcription factors from different bacterial genomes to make a more thorough assessment of information theory-based search methods. Results Our results reveal that conventional benchmarking against artificial sequence data leads frequently to overestimation of search efficiency. In addition, we find that sequence information by itself is often inadequate and therefore must be complemented by other cues, such as curvature, in real genomes. Furthermore, results on skewed genomes show that methods integrating skew information, such as <it>Relative Entropy</it>, are not effective because their assumptions may not hold in real genomes. The evidence suggests that binding sites tend to evolve towards genomic skew, rather than against it, and to maintain their information content through increased conservation. Based on these results, we identify several misconceptions on information theory as applied to binding sites, such as negative entropy, and we propose a revised paradigm to explain the observed results. Conclusion We conclude that, among information theory-based methods, the most unassuming search methods perform, on average, better than any other alternatives, since heuristic corrections to these methods are prone to fail when working on real data. A reexamination of information content in binding sites reveals that information content is a compound measure of search and binding affinity requirements, a fact that has important repercussions for our understanding of binding site evolution.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central