153 research outputs found

    EcoCyc: A comprehensive view of Escherichia coli biology

    Get PDF
    EcoCyc (http://EcoCyc.org) provides a comprehensive encyclopedia of Escherichia coli biology. EcoCyc integrates information about the genome, genes and gene products; the metabolic network; and the regulatory network of E. coli. Recent EcoCyc developments include a new initiative to represent and curate all types of E. coli regulatory processes such as attenuation and regulation by small RNAs. EcoCyc has started to curate Gene Ontology (GO) terms for E. coli and has made a dataset of E. coli GO terms available through the GO Web site. The curation and visualization of electron transfer processes has been significantly improved. Other software and Web site enhancements include the addition of tracks to the EcoCyc genome browser, in particular a type of track designed for the display of ChIP-chip datasets, and the development of a comparative genome browser. A new Genome Omics Viewer enables users to paint omics datasets onto the full E. coli genome for analysis. A new advanced query page guides users in interactively constructing complex database queries against EcoCyc. A Macintosh version of EcoCyc is now available. A series of Webinars is available to instruct users in the use of EcoCyc

    Strong negative self regulation of Prokaryotic transcription factors increases the intrinsic noise of protein expression

    Get PDF
    Background Many prokaryotic transcription factors repress their own transcription. It is often asserted that such regulation enables a cell to homeostatically maintain protein abundance. We explore the role of negative self regulation of transcription in regulating the variability of protein abundance using a variety of stochastic modeling techniques. Results We undertake a novel analysis of a classic model for negative self regulation. We demonstrate that, with standard approximations, protein variance relative to its mean should be independent of repressor strength in a physiological range. Consequently, in that range, the coefficient of variation would increase with repressor strength. However, stochastic computer simulations demonstrate that there is a greater increase in noise associated with strong repressors than predicted by theory. The discrepancies between the mathematical analysis and computer simulations arise because with strong repressors the approximation that leads to Michaelis-Menten-like hyperbolic repression terms ceases to be valid. Because we observe that strong negative feedback increases variability and so is unlikely to be a mechanism for noise control, we suggest instead that negative feedback is evolutionarily favoured because it allows the cell to minimize mRNA usage. To test this, we used in silico evolution to demonstrate that while negative feedback can achieve only a modest improvement in protein noise reduction compared with the unregulated system, it can achieve good improvement in protein response times and very substantial improvement in reducing mRNA levels. Conclusions Strong negative self regulation of transcription may not always be a mechanism for homeostatic control of protein abundance, but instead might be evolutionarily favoured as a mechanism to limit the use of mRNA. The use of hyperbolic terms derived from quasi-steady-state approximation should also be avoided in the analysis of stochastic models with strong repressors

    Challenges in integrating Escherichia coli molecular biology data

    Get PDF
    One key challenge in Systems Biology is to provide mechanisms to collect and integrate the necessary data to be able to meet multiple analysis requirements. Typically, biological contents are scattered over multiple data sources and there is no easy way of comparing heterogeneous data contents. This work discusses ongoing standardisation and interoperability efforts and exposes integration challenges for the model organism Escherichia coli K-12. The goal is to analyse the major obstacles faced by integration processes, suggest ways to systematically identify them, and whenever possible, propose solutions or means to assistmanual curation. Integration of gene, protein and compound data was evaluated by performing comparisons over EcoCyc, KEGG, BRENDA, ChEBI, Entrez Gene and UniProt contents. Cross-links, a number of standard nomenclatures and name information supported the comparisons. Except for the gene integration scenario, in no other scenario an element of integration performed well enough to support the process by itself. Indeed, both the integration of enzyme and compound records imply considerable curation. Results evidenced that, even for a well-studied model organism, source contents are still far from being as standardized as it would be desired and metadata varies considerably from source to source. Before designing any data integration pipeline, researchers should decide on the sources that best fit the purpose of analysis and be aware of existing conflicts/inconsistencies to be able to intervene in their resolution. Moreover, they should be aware of the limits of automatic integration such that they can define the extent of necessary manual curation for each application.Portuguese FCT funded MIT-Portugal Program in Bioengineering (MIT-Pt/BS-BB/0082/2008); PhD grant from FCT (ref. SFRH/BD/22863/2005) to S.

    Multidimensional annotation of the Escherichia coli K-12 genome

    Get PDF
    The annotation of the Escherichia coli K-12 genome in the EcoCyc database is one of the most accurate, complete and multidimensional genome annotations. Of the 4460 E. coli genes, EcoCyc assigns biochemical functions to 76%, and 66% of all genes had their functions determined experimentally. EcoCyc assigns E. coli genes to Gene Ontology and to MultiFun. Seventy-five percent of gene products contain reviews authored by the EcoCyc project that summarize the experimental literature about the gene product. EcoCyc information was derived from 15 000 publications. The database contains extensive descriptions of E. coli cellular networks, describing its metabolic, transport and transcriptional regulatory processes. A comparison to genome annotations for other model organisms shows that the E. coli genome contains the most experimentally determined gene functions in both relative and absolute terms: 2941 (66%) for E. coli, 2319 (37%) for Saccharomyces cerevisiae, 1816 (5%) for Arabidopsis thaliana, 1456 (4%) for Mus musculus and 614 (4%) for Drosophila melanogaster. Database queries to EcoCyc survey the global properties of E. coli cellular networks and illuminate the extent of information gaps for E. coli, such as dead-end metabolites. EcoCyc provides a genome browser with novel properties, and a novel interactive display of transcriptional regulatory networks

    EcoCyc: a comprehensive database of Escherichia coli biology

    Get PDF
    EcoCyc (http://EcoCyc.org) is a comprehensive model organism database for Escherichia coli K-12 MG1655. From the scientific literature, EcoCyc captures the functions of individual E. coli gene products; their regulation at the transcriptional, post-transcriptional and protein level; and their organization into operons, complexes and pathways. EcoCyc users can search and browse the information in multiple ways. Recent improvements to the EcoCyc Web interface include combined gene/protein pages and a Regulation Summary Diagram displaying a graphical overview of all known regulatory inputs to gene expression and protein activity. The graphical representation of signal transduction pathways has been updated, and the cellular and regulatory overviews were enhanced with new functionality. A specialized undergraduate teaching resource using EcoCyc is being developed

    The Escherichia coli transcriptome mostly consists of independently regulated modules

    Get PDF
    Underlying cellular responses is a transcriptional regulatory network (TRN) that modulates gene expression. A useful description of the TRN would decompose the transcriptome into targeted effects of individual transcriptional regulators. Here, we apply unsupervised machine learning to a diverse compendium of over 250 high-quality Escherichia coli RNA-seq datasets to identify 92 statistically independent signals that modulate the expression of specific gene sets. We show that 61 of these transcriptomic signals represent the effects of currently characterized transcriptional regulators. Condition-specific activation of signals is validated by exposure of E. coli to new environmental conditions. The resulting decomposition of the transcriptome provides: a mechanistic, systems-level, network-based explanation of responses to environmental and genetic perturbations; a guide to gene and regulator function discovery; and a basis for characterizing transcriptomic differences in multiple strains. Taken together, our results show that signal summation describes the composition of a model prokaryotic transcriptome

    The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases

    Get PDF
    The MetaCyc database (http://metacyc.org/) provides a comprehensive and freely accessible resource for metabolic pathways and enzymes from all domains of life. The pathways in MetaCyc are experimentally determined, small-molecule metabolic pathways and are curated from the primary scientific literature. MetaCyc contains more than 1800 pathways derived from more than 30 000 publications, and is the largest curated collection of metabolic pathways currently available. Most reactions in MetaCyc pathways are linked to one or more well-characterized enzymes, and both pathways and enzymes are annotated with reviews, evidence codes and literature citations. BioCyc (http://biocyc.org/) is a collection of more than 1700 organism-specific Pathway/Genome Databases (PGDBs). Each BioCyc PGDB contains the full genome and predicted metabolic network of one organism. The network, which is predicted by the Pathway Tools software using MetaCyc as a reference database, consists of metabolites, enzymes, reactions and metabolic pathways. BioCyc PGDBs contain additional features, including predicted operons, transport systems and pathway-hole fillers. The BioCyc website and Pathway Tools software offer many tools for querying and analysis of PGDBs, including Omics Viewers and comparative analysis. New developments include a zoomable web interface for diagrams; flux-balance analysis model generation from PGDBs; web services; and a new tool called Web Groups

    Chemical Basis of Metabolic Network Organization

    Get PDF
    Although the metabolic networks of the three domains of life consist of different constituents and metabolic pathways, they exhibit the same scale-free organization. This phenomenon has been hypothetically explained by preferential attachment principle that the new-recruited metabolites attach preferentially to those that are already well connected. However, since metabolites are usually small molecules and metabolic processes are basically chemical reactions, we speculate that the metabolic network organization may have a chemical basis. In this paper, chemoinformatic analyses on metabolic networks of Kyoto Encyclopedia of Genes and Genomes (KEGG), Escherichia coli and Saccharomyces cerevisiae were performed. It was found that there exist qualitative and quantitative correlations between network topology and chemical properties of metabolites. The metabolites with larger degrees of connectivity (hubs) are of relatively stronger polarity. This suggests that metabolic networks are chemically organized to a certain extent, which was further elucidated in terms of high concentrations required by metabolic hubs to drive a variety of reactions. This finding not only provides a chemical explanation to the preferential attachment principle for metabolic network expansion, but also has important implications for metabolic network design and metabolite concentration prediction

    Burden-driven feedback control of gene expression

    Get PDF
    Cells use feedback regulation to ensure robust growth despite fluctuating demands for resources and differing environmental conditions. However, the expression of foreign proteins from engineered constructs is an unnatural burden that cells are not adapted for. Here we combined RNA-seq with an in vivo assay to identify the major transcriptional changes that occur in Escherichia coli when inducible synthetic constructs are expressed. We observed that native promoters related to the heat-shock response activated expression rapidly in response to synthetic expression, regardless of the construct. Using these promoters, we built a dCas9-based feedback-regulation system that automatically adjusts the expression of a synthetic construct in response to burden. Cells equipped with this general-use controller maintained their capacity for native gene expression to ensure robust growth and thus outperformed unregulated cells in terms of protein yield in batch production. This engineered feedback is to our knowledge the first example of a universal, burden-based biomolecular control system and is modular, tunable and portable

    The representation of protein complexes in the Protein Ontology (PRO)

    Get PDF
    BACKGROUND: Representing species-specific proteins and protein complexes in ontologies that are both human- and machine-readable facilitates the retrieval, analysis, and interpretation of genome-scale data sets. Although existing protin-centric informatics resources provide the biomedical research community with well-curated compendia of protein sequence and structure, these resources lack formal ontological representations of the relationships among the proteins themselves. The Protein Ontology (PRO) Consortium is filling this informatics resource gap by developing ontological representations and relationships among proteins and their variants and modified forms. Because proteins are often functional only as members of stable protein complexes, the PRO Consortium, in collaboration with existing protein and pathway databases, has launched a new initiative to implement logical and consistent representation of protein complexes. DESCRIPTION: We describe here how the PRO Consortium is meeting the challenge of representing species-specific protein complexes, how protein complex representation in PRO supports annotation of protein complexes and comparative biology, and how PRO is being integrated into existing community bioinformatics resources. The PRO resource is accessible at http://pir.georgetown.edu/pro/. CONCLUSION: PRO is a unique database resource for species-specific protein complexes. PRO facilitates robust annotation of variations in composition and function contexts for protein complexes within and between species
    corecore