33 research outputs found

    PromBase: a web resource for various genomic features and predicted promoters in prokaryotic genomes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>As more and more genomes are being sequenced, an overview of their genomic features and annotation of their functional elements, which control the expression of each gene or transcription unit of the genome, is a fundamental challenge in genomics and bioinformatics.</p> <p>Findings</p> <p>Relative stability of DNA sequence has been used to predict promoter regions in 913 microbial genomic sequences with GC-content ranging from 16.6% to 74.9%. Irrespective of the genome GC-content the relative stability based promoter prediction method has already been proven to be robust in terms of recall and precision. The predicted promoter regions for the 913 microbial genomes have been accumulated in a database called PromBase. Promoter search can be carried out in PromBase either by specifying the gene name or the genomic position. Each predicted promoter region has been assigned to a reliability class (low, medium, high, very high and highest) based on the difference between its average free energy and the downstream region. The recall and precision values for each class are shown graphically in PromBase. In addition, PromBase provides detailed information about base composition, CDS and CG/TA skews for each genome and various DNA sequence dependent structural properties (average free energy, curvature and bendability) in the vicinity of all annotated translation start sites (TLS).</p> <p>Conclusion</p> <p>PromBase is a database, which contains predicted promoter regions and detailed analysis of various genomic features for 913 microbial genomes. PromBase can serve as a valuable resource for comparative genomics study and help the experimentalist to rapidly access detailed information on various genomic features and putative promoter regions in any given genome. This database is freely accessible for academic and non- academic users via the worldwide web <url>http://nucleix.mbu.iisc.ernet.in/prombase/</url>.</p

    Promoter prediction in E. coli based on SIDD profiles and Artificial Neural Networks

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>One of the major challenges in biology is the correct identification of promoter regions. Computational methods based on motif searching have been the traditional approach taken. Recent studies have shown that DNA structural properties, such as curvature, stacking energy, and stress-induced duplex destabilization (SIDD) are useful in promoter prediction, as well. In this paper, the currently used SIDD energy threshold method is compared to the proposed artificial neural network (ANN) approach for finding promoters based on SIDD profile data.</p> <p>Results</p> <p>When compared to the SIDD threshold prediction method, artificial neural networks showed noticeable improvements for precision, recall, and <it>F</it>-score over a range of values. The maximal <it>F</it>-score for the ANN classifier was 62.3 and 56.8 for the threshold-based classifier.</p> <p>Conclusions</p> <p>Artificial neural networks were used to predict promoters based on SIDD profile data. Results using this technique were an improvement over the previous SIDD threshold approach. Over a wide range of precision-recall values, artificial neural networks were more capable of identifying distinctive characteristics of promoter regions than threshold based methods.</p

    High-quality annotation of promoter regions for 913 bacterial genomes

    No full text
    Motivation: The number of bacterial genomes being sequenced is increasing very rapidly and hence, it is crucial to have procedures for rapid and reliable annotation of their functional elements such as promoter regions, which control the expression of each gene or each transcription unit of the genome. The present work addresses this requirement and presents a generic method applicable across organisms. Results: Relative stability of the DNA double helical sequences has been used to discriminate promoter regions from non-promoter regions. Based on the difference in stability between neighboring regions, an algorithm has been implemented to predict promoter regions on a large scale over 913 microbial genome sequences. The average free energy values for the promoter regions as well as their downstream regions are found to differ, depending on their GC content. Threshold values to identify promoter regions have been derived using sequences flanking a subset of translation start sites from all microbial genomes and then used to predict promoters over the complete genome sequences. An average recall value of 72% (which indicates the percentage of protein and RNA coding genes with predicted promoter regions assigned to them) and precision of 56% is achieved over the 913 microbial genome dataset

    High-quality annotation of promoter regions for 913 bacterial genomes

    No full text
    Motivation: The number of bacterial genomes being sequenced is increasing very rapidly and hence, it is crucial to have procedures for rapid and reliable annotation of their functional elements such as promoter regions, which control the expression of each gene or each transcription unit of the genome. The present work addresses this requirement and presents a generic method applicable across organisms. Results: Relative stability of the DNA double helical sequences has been used to discriminate promoter regions from non-promoter regions. Based on the difference in stability between neighboring regions, an algorithm has been implemented to predict promoter regions on a large scale over 913 microbial genome sequences. The average free energy values for the promoter regions as well as their downstream regions are found to differ, depending on their GC content. Threshold values to identify promoter regions have been derived using sequences flanking a subset of translation start sites from all microbial genomes and then used to predict promoters over the complete genome sequences. An average recall value of 72% (which indicates the percentage of protein and RNA coding genes with predicted promoter regions assigned to them) and precision of 56% is achieved over the 913 microbial genome dataset

    Relative stability of DNA as a generic criterion for promoter prediction: whole genome annotation of microbial genomes with varying nucleotide base composition

    No full text
    The rapid increase in genome sequence information has necessitated the annotation of their functional elements, particularly those occurring in the non-coding regions, in the genomic context. Promoter region is the key regulatory region, which enables the gene to be transcribed or repressed, but it is difficult to determine experimentally. Hence an in silico identification of promoters is crucial in order to guide experimental work and to pin point the key region that controls the transcription initiation of a gene. In this analysis, we demonstrate that while the promoter regions are in general less stable than the flanking regions, their average free energy varies depending on the GC composition of the flanking genomic sequence. We have therefore obtained a set of free energy threshold values, for genomic DNA with varying GC content and used them as generic criteria for predicting promoter regions in several microbial genomes, using an in-house developed tool `PromPredict'. On applying it to predict promoter regions corresponding to the 1144 and 612 experimentally validated TSSs in E. coli (50.8% GC) and B. subtilis (43.5% GC) sensitivity of 99% and 95% and precision values of 58% and 60%, respectively, were achieved. For the limited data set of 81 TSSs available for M. tuberculosis (65.6% GC) a sensitivity of 100% and precision of 49% was obtained

    Carbohydrate-based drug design: Recognition fingerprints and their use in lead identification

    No full text
    77-92Carbohydrate-based therapeutics is a rapidly developing theme, resulting directly from the recent increase in interest in the biological roles of carbohydrates. A fundamental requirement for successful drug design is a detailed understanding of protein-carbohydrate interactions. This article reports a structural bioinformatics study of several carbohydrate binding proteins to identify common minimum principles required for the recognition of mannose, glucose and galactose, which indeed form much or the basis for recognition of higher sugars. The study identifies all aspartic acid -04 sugar hydroxyl interaction to be highly conserved, which appears to be crucial for recognition of all three sugars. Other interactions are specific to particular sugars, leading to individual fingerprints. These fingerprints have then been used in the identification of lead compounds, using fragment-based design approaches. The results obtained by such guided design protocols are found to be more focused than those obtained from comparable ab-initio design protocols. These studies, apart from providing clues about the usable pharmacophore space for these structures, also prove that the use of fingerprints in a fragment-based ligand design, leads to the design of a sugar-like ligand, mimicking the natural carbohydrate ligand in each of the eight examples studied

    Distinguishing between productive and abortive promoters using a random forest classifier in Mycoplasma pneumoniae

    No full text
    Distinguishing between promoter-like sequences in bacteria that belong to true or abortive promoters, or to those that do not initiate transcription at all, is one of the important challenges in transcriptomics. To address this problem, we have studied the genome-reduced bacterium Mycoplasma pneumoniae, for which the RNAs associated with transcriptional start sites have been recently experimentally identified. We determined the contribution to transcription events of different genomic features: the -10, extended -10 and -35 boxes, the UP element, the bases surrounding the -10 box and the nearest-neighbor free energy of the promoter region. Using a random forest classifier and the aforementioned features transformed into scores, we could distinguish between true, abortive promoters and non-promoters with good -10 box sequences. The methods used in this characterization of promoters can be extended to other bacteria and have important applications for promoter design in bacterial genome engineering.European Union Seventh Framework Programme (FP7/2007–2013), through the European Research Council [232913]; Fundación Botín, the Spanish Ministry of Economy and Competitiveness [BIO2007-61762]; National Plan of R + D + i; ISCIII – Subdirección General de Evaluación y Fomento de la Investigación [PI10/01702]; European Regional Development Fund (ERDF) (to the ICREA Research Professor L.S.]; Spanish Ministry of Economy and Competitiveness, ‘Centro de Excelencia Severo Ochoa 2013–2017 [SEV-2012-0208]. Funding for open access charge: European Union Seventh Framework Programme (FP7/2007–2013), through the European Research Council [232913]; Fundaci´on Bot´ın, the Spanish Ministry of Economy and Competitiveness [BIO2007-61762]; National Plan of R + D + i; ISCIII – Subdirección General de Evaluación y Fomento de la Investigación [PI10/01702]; European Regional Development Fund (ERDF) (to the ICREA Research Professor L.S.]; Spanish Ministry of Economy and Competitiveness, ‘Centro de Excelencia Severo Ochoa 2013–2017 [SEV-2012-0208]
    corecore