17 research outputs found

    Engineering of increased L-Threonine production in bacteria by combinatorial cloning and machine learning

    Get PDF
    The goal of this study is to develop a general strategy for bacterial engineering using an integrated synthetic biology and machine learning (ML) approach. This strategy was developed in the context of increasing L-threonine production in Escherichia coli ATCC 21277. A set of 16 genes was initially selected based on metabolic pathway relevance to threonine biosynthesis and used for combinatorial cloning to construct a set of 385 strains to generate training data (i.e., a range of L-threonine titers linked to each of the specific gene combinations). Hybrid (regression/classification) deep learning (DL) models were developed and used to predict additional gene combinations in subsequent rounds of combinatorial cloning for increased L-threonine production based on the training data. As a result, E. coli strains built after just three rounds of iterative combinatorial cloning and model prediction generated higher L-threonine titers (from 2.7 g/L to 8.4 g/L) than those of patented L-threonine strains being used as controls (4–5 g/L). Interesting combinations of genes in L-threonine production included deletions of the tdh, metL, dapA, and dhaM genes as well as overexpression of the pntAB, ppc, and aspC genes. Mechanistic analysis of the metabolic system constraints for the best performing constructs offers ways to improve the models by adjusting weights for specific gene combinations. Graph theory analysis of pairwise gene modifications and corresponding levels of L-threonine production also suggests additional rules that can be incorporated into future ML models

    The RAST Server: Rapid Annotations using Subsystems Technology

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The number of prokaryotic genome sequences becoming available is growing steadily and is growing faster than our ability to accurately annotate them.</p> <p>Description</p> <p>We describe a fully automated service for annotating bacterial and archaeal genomes. The service identifies protein-encoding, rRNA and tRNA genes, assigns functions to the genes, predicts which subsystems are represented in the genome, uses this information to reconstruct the metabolic network and makes the output easily downloadable for the user. In addition, the annotated genome can be browsed in an environment that supports comparative analysis with the annotated genomes maintained in the SEED environment.</p> <p>The service normally makes the annotated genome available within 12–24 hours of submission, but ultimately the quality of such a service will be judged in terms of accuracy, consistency, and completeness of the produced annotations. We summarize our attempts to address these issues and discuss plans for incrementally enhancing the service.</p> <p>Conclusion</p> <p>By providing accurate, rapid annotation freely to the community we have created an important community resource. The service has now been utilized by over 120 external users annotating over 350 distinct genomes.</p

    KBase: The United States Department of Energy Systems Biology Knowledgebase.

    Get PDF

    SEED Servers: High-Performance Access to the SEED Genomes, Annotations, and Metabolic Models

    Get PDF
    <div><p>The remarkable advance in sequencing technology and the rising interest in medical and environmental microbiology, biotechnology, and synthetic biology resulted in a deluge of published microbial genomes. Yet, genome annotation, comparison, and modeling remain a major bottleneck to the translation of sequence information into biological knowledge, hence computational analysis tools are continuously being developed for rapid genome annotation and interpretation. Among the earliest, most comprehensive resources for prokaryotic genome analysis, the SEED project, initiated in 2003 as an integration of genomic data and analysis tools, now contains >5,000 complete genomes, a constantly updated set of curated annotations embodied in a large and growing collection of encoded subsystems, a derived set of protein families, and hundreds of genome-scale metabolic models. Until recently, however, maintaining current copies of the SEED code and data at remote locations has been a pressing issue. To allow high-performance remote access to the SEED database, we developed the SEED Servers (<a href="http://www.theseed.org/servers">http://www.theseed.org/servers</a>): four network-based servers intended to expose the data in the underlying relational database, support basic annotation services, offer programmatic access to the capabilities of the RAST annotation server, and provide access to a growing collection of metabolic models that support flux balance analysis. The SEED servers offer open access to regularly updated data, the ability to annotate prokaryotic genomes, the ability to create metabolic reconstructions and detailed models of metabolism, and access to hundreds of existing metabolic models. This work offers and supports a framework upon which other groups can build independent research efforts. Large integrations of genomic data represent one of the major intellectual resources driving research in biology, and programmatic access to the SEED data will provide significant utility to a broad collection of potential users.</p> </div

    Processing ids_to_sequences.

    No full text
    <p>(a) The ids_to_sequences function call accepts multiple IDs as an argument and uses the Sapling server to process the calls. These are returned as a single table. (b) A detailed description of each call (in this example, the ids_to_sequences) is provided online and is automatically generated from the entity-relationship models shown in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0048053#pone-0048053-g002" target="_blank">Figure 2</a>.</p

    Architecture of the SEED servers.

    No full text
    <p>The client packages (currently available for Perl or Java) handle the HTTP requests and responses, and parse the data from the appropriate lightweight data exchange formats to data structures. The four servers access the SEED data.</p
    corecore