7 research outputs found

    Automatic reconstruction of a bacterial regulatory network using Natural Language Processing

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Manual curation of biological databases, an expensive and labor-intensive process, is essential for high quality integrated data. In this paper we report the implementation of a state-of-the-art Natural Language Processing system that creates computer-readable networks of regulatory interactions directly from different collections of abstracts and full-text papers. Our major aim is to understand how automatic annotation using Text-Mining techniques can complement manual curation of biological databases. We implemented a rule-based system to generate networks from different sets of documents dealing with regulation in <it>Escherichia coli </it>K-12.</p> <p>Results</p> <p>Performance evaluation is based on the most comprehensive transcriptional regulation database for any organism, the manually-curated RegulonDB, 45% of which we were able to recreate automatically. From our automated analysis we were also able to find some new interactions from papers not already curated, or that were missed in the manual filtering and review of the literature. We also put forward a novel Regulatory Interaction Markup Language better suited than SBML for simultaneously representing data of interest for biologists and text miners.</p> <p>Conclusion</p> <p>Manual curation of the output of automatic processing of text is a good way to complement a more detailed review of the literature, either for validating the results of what has been already annotated, or for discovering facts and information that might have been overlooked at the triage or curation stages.</p

    Genome-Wide Identification of Transcription Start Sites, Promoters and Transcription Factor Binding Sites in E. coli

    Get PDF
    Despite almost 40 years of molecular genetics research in Escherichia coli a major fraction of its Transcription Start Sites (TSSs) are still unknown, limiting therefore our understanding of the regulatory circuits that control gene expression in this model organism. RegulonDB (http://regulondb.ccg.unam.mx/) is aimed at integrating the genetic regulatory network of E. coli K12 as an entirely bioinformatic project up till now. In this work, we extended its aims by generating experimental data at a genome scale on TSSs, promoters and regulatory regions. We implemented a modified 5′ RACE protocol and an unbiased High Throughput Pyrosequencing Strategy (HTPS) that allowed us to map more than 1700 TSSs with high precision. From this collection, about 230 corresponded to previously reported TSSs, which helped us to benchmark both our methodologies and the accuracy of the previous mapping experiments. The other ca 1500 TSSs mapped belong to about 1000 different genes, many of them with no assigned function. We identified promoter sequences and type of σ factors that control the expression of about 80% of these genes. As expected, the housekeeping σ70 was the most common type of promoter, followed by σ38. The majority of the putative TSSs were located between 20 to 40 nucleotides from the translational start site. Putative regulatory binding sites for transcription factors were detected upstream of many TSSs. For a few transcripts, riboswitches and small RNAs were found. Several genes also had additional TSSs within the coding region. Unexpectedly, the HTPS experiments revealed extensive antisense transcription, probably for regulatory functions. The new information in RegulonDB, now with more than 2400 experimentally determined TSSs, strengthens the accuracy of promoter prediction, operon structure, and regulatory networks and provides valuable new information that will facilitate the understanding from a global perspective the complex and intricate regulatory network that operates in E. coli

    Time-Resolved Transcriptome Analysis of Bacillus subtilis Responding to Valine, Glutamate, and Glutamine

    Get PDF
    Microorganisms can restructure their transcriptional output to adapt to environmental conditions by sensing endogenous metabolite pools. In this paper, an Agilent customized microarray representing 4,106 genes was used to study temporal transcript profiles of Bacillus subtilis in response to valine, glutamate and glutamine pulses over 24 h. A total of 673, 835, and 1135 amino-acid-regulated genes were identified having significantly changed expression at one or more time points in response to valine, glutamate, and glutamine, respectively, including genes involved in cell wall, cellular import, metabolism of amino-acids and nucleotides, transcriptional regulation, flagellar motility, chemotaxis, phage proteins, sporulation, and many genes of unknown function. Different amino acid treatments were compared in terms of both the global temporal profiles and the 5-minute quick regulations, and between-experiment differential genes were identified. The highlighted genes were analyzed based on diverse sources of gene functions using a variety of computational tools, including T-profiler analysis, and hierarchical clustering. The results revealed the common and distinct modes of action of these three amino acids, and should help to elucidate the specific signaling mechanism of each amino acid as an effector
    corecore