Search CORE

17 research outputs found

Engineering of increased L-Threonine production in bacteria by combinatorial cloning and machine learning

Author: Akins Chase
Antonopoulos Dionysios
Babnigg Gyorgy
Brettin Thomas
Chlenski Philippe
Foflonker Fatima
Fonstein Michael
Hanke Paul
Henry Chris
Parrello Bruce
Stevens Rick
Vasieva Olga
Publication venue: 'Elsevier BV'
Publication date: 09/07/2023
Field of study

The goal of this study is to develop a general strategy for bacterial engineering using an integrated synthetic biology and machine learning (ML) approach. This strategy was developed in the context of increasing L-threonine production in Escherichia coli ATCC 21277. A set of 16 genes was initially selected based on metabolic pathway relevance to threonine biosynthesis and used for combinatorial cloning to construct a set of 385 strains to generate training data (i.e., a range of L-threonine titers linked to each of the specific gene combinations). Hybrid (regression/classification) deep learning (DL) models were developed and used to predict additional gene combinations in subsequent rounds of combinatorial cloning for increased L-threonine production based on the training data. As a result, E. coli strains built after just three rounds of iterative combinatorial cloning and model prediction generated higher L-threonine titers (from 2.7 g/L to 8.4 g/L) than those of patented L-threonine strains being used as controls (4–5 g/L). Interesting combinations of genes in L-threonine production included deletions of the tdh, metL, dapA, and dhaM genes as well as overexpression of the pntAB, ppc, and aspC genes. Mechanistic analysis of the metabolic system constraints for the best performing constructs offers ways to improve the models by adjusting weights for specific gene combinations. Graph theory analysis of pairwise gene modifications and corresponding levels of L-threonine production also suggests additional rules that can be incorporated into future ML models

Directory of Open Access Journals

Knowledge UChicago

The RAST Server: Rapid Annotations using Subsystems Technology

Author: Aziz Ramy K
Bartels Daniela
Best Aaron A
DeJongh Matthew
Disz Terrence
Edwards Robert A
Formsma Kevin
Gerdes Svetlana
Glass Elizabeth M
Kubal Michael
McNeil Leslie K
Meyer Folker
Olsen Gary J
Olson Robert
Osterman Andrei L
Overbeek Ross A
Paarmann Daniel
Paczian Tobias
Parrello Bruce
Pusch Gordon D
Reich Claudia
Stevens Rick
Vassieva Olga
Vonstein Veronika
Wilke Andreas
Zagnitko Olga
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background The number of prokaryotic genome sequences becoming available is growing steadily and is growing faster than our ability to accurately annotate them. Description We describe a fully automated service for annotating bacterial and archaeal genomes. The service identifies protein-encoding, rRNA and tRNA genes, assigns functions to the genes, predicts which subsystems are represented in the genome, uses this information to reconstruct the metabolic network and makes the output easily downloadable for the user. In addition, the annotated genome can be browsed in an environment that supports comparative analysis with the annotated genomes maintained in the SEED environment. The service normally makes the annotated genome available within 12–24 hours of submission, but ultimately the quality of such a service will be judged in terms of accuracy, consistency, and completeness of the produced annotations. We summarize our attempts to address these issues and discuss plans for incrementally enhancing the service. Conclusion By providing accurate, rapid annotation freely to the community we have created an important community resource. The service has now been utilized by over 120 external users annotating over 350 distinct genomes.</p

Digital Commons@Hope College

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

KBase: The United States Department of Energy Systems Biology Knowledgebase.

Author: A Prlić
Aaron A Best
Adam P Arkin
Annette Greiner
Ben Bowen
Benjamin H Allen
BR Bochner
Brian H Davison
Bruce Parrello
Christopher C Bun
Christopher S Henry
CS Henry
Daifeng Wang
Dan Gunter
Dan Murphy-Olson
Dantong Yu
David J Weston
DM Goodstein
Doreen Ware
Dylan Chivian
Elizabeth M Glass
Emily Dietrich
Erik Pearson
F Perez
Fangfang Xia
Fei He
Fernando Perez
Folker Meyer
Gang Fang
Gary J Olsen
Gavin A Price
Holly L Haun
Hyunseung Yoo
Inna Dubchak
J Schellenberger
James Gurtowski
James J Davis
James Thomason
Janaka N Edirisinghe
Jason Baumohl
Jer-Ming Chia
JN Edirisinghe
John-Marc Chandonia
José P Faria
JP Faria
KD Pruitt
Kevin P Keegan
KJ Millman
M Kanehisa
Marcin P Joachimiak
Marissa Mills
Mark Gerstein
Matthew DeJongh
Matthew L Henderson
Maulik Shukla
Meghan M Drake
Michael C Schatz
Michael W Sneddon
Miriam L Land
Mustafa H Syed
Nathan L Tintle
Neal Conrad
Nomi L Harris
Pamela C Ronald
Paramvir Dehal
Paul M Frybarger
Pavel S Novichkov
Priya Ranjan
R Caspi
Rashmi Jain
Ric Colasanti
Rick L Stevens
Robert Olson
Robert W Cottingham
Roman A Sutormin
Roy T Kamimura
S Magnúsdóttir
Samuel M D Seaver
Sarah S Poon
Scott Devoid
Sergei Maslov
Shane Canon
Shinjae Yoo
Shinnosuke Kondo
Shiran Pasternak
Srividya Ramakrishnan
Stephen Y Chan
Steven E Brenner
Sunita Kumari
T Kluyver
Taeyun Oh
Thomas S Brettin
V Stodden
Vivek Kumar
William J Riehl
Wolfgang Gerlach
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/07/2018
Field of study

Crossref

Cold Spring Harbor Laboratory Institutional Repository

eScholarship - University of California

Recommended from our members

Supervised extraction of near-complete genomes from metagenomic samples: A new service in PATRIC

Author: Butler Rory
Chlenski Philippe
Overbeek Ross
Parrello Bruce
Pusch Gordon D.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 16/05/2023
Field of study

Large amounts of metagenomically-derived data are submitted to PATRIC for analysis. In the future, we expect even more jobs submitted to PATRIC will use metagenomic data. One in-demand use case is the extraction of near-complete draft genomes from assembled contigs of metagenomic origin. The PATRIC metagenome binning service utilizes the PATRIC database to furnish a large, diverse set of reference genomes. We provide a new service for supervised extraction and annotation of high-quality, near-complete genomes from metagenomically-derived contigs. Reference genomes are assigned to putative draft genome bins based on the presence of single-copy universal marker roles in the sample, and contigs are sorted into these bins by their similarity to reference genomes in PATRIC. Each set of binned contigs represents a draft genome that will be annotated by RASTtk in PATRIC. A structured-language binning report is provided containing quality measurements and taxonomic information about the contig bins. The PATRIC metagenome binning service emphasizes extraction of high-quality genomes for downstream analysis using other PATRIC tools and services. Due to its supervised nature, the binning service is not appropriate for mining novel or extremely low-coverage genomes from metagenomic samples.</p

Knowledge UChicago

Recommended from our members

SEED Servers: High-Performance Access to the SEED Genomes, Annotations, and Metabolic Models

Author: Aziz Ramy K.
Devoid Scott
Disz Terrence
Edwards Robert A.
Henry Christopher S.
Olsen Gary J.
Olson Robert
Overbeek Ross
Parrello Bruce
Pusch Gordon D.
Stevens Rick L.
Vonstein Veronika
Xia Fangfang
Publication venue
Publication date: 12/01/2024
Field of study

The remarkable advance in sequencing technology and the rising interest in medical and environmental microbiology, biotechnology, and synthetic biology resulted in a deluge of published microbial genomes. Yet, genome annotation, comparison, and modeling remain a major bottleneck to the translation of sequence information into biological knowledge, hence computational analysis tools are continuously being developed for rapid genome annotation and interpretation. Among the earliest, most comprehensive resources for prokaryotic genome analysis, the SEED project, initiated in 2003 as an integration of genomic data and analysis tools, now contains >5,000 complete genomes, a constantly updated set of curated annotations embodied in a large and growing collection of encoded subsystems, a derived set of protein families, and hundreds of genome-scale metabolic models. Until recently, however, maintaining current copies of the SEED code and data at remote locations has been a pressing issue. To allow high-performance remote access to the SEED database, we developed the SEED Servers (http://www.theseed.org/servers): four network-based servers intended to expose the data in the underlying relational database, support basic annotation services, offer programmatic access to the capabilities of the RAST annotation server, and provide access to a growing collection of metabolic models that support flux balance analysis. The SEED servers offer open access to regularly updated data, the ability to annotate prokaryotic genomes, the ability to create metabolic reconstructions and detailed models of metabolism, and access to hundreds of existing metabolic models. This work offers and supports a framework upon which other groups can build independent research efforts. Large integrations of genomic data represent one of the major intellectual resources driving research in biology, and programmatic access to the SEED data will provide significant utility to a broad collection of potential users.</p

Knowledge UChicago

The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST)

Author: Alice R. Wattam
Bruce Parrello
Fangfang Xia
Gary J. Olsen
Gordon D. Pusch
James J. Davis
Maulik Shukla
Overbeek
Rick Stevens
Robert A. Edwards
Robert Olson
Ross Overbeek
Svetlana Gerdes
Terry Disz
Veronika Vonstein
Publication venue: 'Oxford University Press (OUP)'
Publication date
Field of study

Crossref

SEED Servers: High-Performance Access to the SEED Genomes, Annotations, and Metabolic Models

Author: Bruce Parrello (42723)
Christopher S. Henry (126951)
Fangfang Xia (126959)
Gary J. Olsen (126954)
Gordon D. Pusch (11370)
Ramy K. Aziz (79570)
Rick L. Stevens (126957)
Robert A. Edwards (41625)
Robert Olson (11367)
Ross Overbeek (11341)
Scott Devoid (126948)
Terrence Disz (42714)
Veronika Vonstein (11378)
Publication venue
Publication date: 24/10/2012
Field of study

<div>The remarkable advance in sequencing technology and the rising interest in medical and environmental microbiology, biotechnology, and synthetic biology resulted in a deluge of published microbial genomes. Yet, genome annotation, comparison, and modeling remain a major bottleneck to the translation of sequence information into biological knowledge, hence computational analysis tools are continuously being developed for rapid genome annotation and interpretation. Among the earliest, most comprehensive resources for prokaryotic genome analysis, the SEED project, initiated in 2003 as an integration of genomic data and analysis tools, now contains >5,000 complete genomes, a constantly updated set of curated annotations embodied in a large and growing collection of encoded subsystems, a derived set of protein families, and hundreds of genome-scale metabolic models. Until recently, however, maintaining current copies of the SEED code and data at remote locations has been a pressing issue. To allow high-performance remote access to the SEED database, we developed the SEED Servers (<a href="http://www.theseed.org/servers">http://www.theseed.org/servers</a>): four network-based servers intended to expose the data in the underlying relational database, support basic annotation services, offer programmatic access to the capabilities of the RAST annotation server, and provide access to a growing collection of metabolic models that support flux balance analysis. The SEED servers offer open access to regularly updated data, the ability to annotate prokaryotic genomes, the ability to create metabolic reconstructions and detailed models of metabolism, and access to hundreds of existing metabolic models. This work offers and supports a framework upon which other groups can build independent research efforts. Large integrations of genomic data represent one of the major intellectual resources driving research in biology, and programmatic access to the SEED data will provide significant utility to a broad collection of potential users. </div

Directory of Open Access Journals

PubMed Central

FigShare

Processing ids_to_sequences.

Author: Bruce Parrello (42723)
Christopher S. Henry (126951)
Fangfang Xia (126959)
Gary J. Olsen (126954)
Gordon D. Pusch (11370)
Ramy K. Aziz (79570)
Rick L. Stevens (126957)
Robert A. Edwards (41625)
Robert Olson (11367)
Ross Overbeek (11341)
Scott Devoid (126948)
Terrence Disz (42714)
Veronika Vonstein (11378)
Publication venue
Publication date
Field of study

(a) The ids_to_sequences function call accepts multiple IDs as an argument and uses the Sapling server to process the calls. These are returned as a single table. (b) A detailed description of each call (in this example, the ids_to_sequences) is provided online and is automatically generated from the entity-relationship models shown in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0048053#pone-0048053-g002" target="_blank">Figure 2</a>.</p

FigShare

Architecture of the SEED servers.

Author: Bruce Parrello (42723)
Christopher S. Henry (126951)
Fangfang Xia (126959)
Gary J. Olsen (126954)
Gordon D. Pusch (11370)
Ramy K. Aziz (79570)
Rick L. Stevens (126957)
Robert A. Edwards (41625)
Robert Olson (11367)
Ross Overbeek (11341)
Scott Devoid (126948)
Terrence Disz (42714)
Veronika Vonstein (11378)
Publication venue
Publication date
Field of study

The client packages (currently available for Perl or Java) handle the HTTP requests and responses, and parse the data from the appropriate lightweight data exchange formats to data structures. The four servers access the SEED data.</p

FigShare