17 research outputs found
A retrosynthetic biology approach to metabolic pathway design for therapeutic production
<p>Abstract</p> <p>Background</p> <p>Synthetic biology is used to develop cell factories for production of chemicals by constructively importing heterologous pathways into industrial microorganisms. In this work we present a retrosynthetic approach to the production of therapeutics with the goal of developing an <it>in situ </it>drug delivery device in host cells. Retrosynthesis, a concept originally proposed for synthetic chemistry, iteratively applies reversed chemical transformations (reversed enzyme-catalyzed reactions in the metabolic space) starting from a target product to reach precursors that are endogenous to the chassis. So far, a wider adoption of retrosynthesis into the manufacturing pipeline has been hindered by the complexity of enumerating all feasible biosynthetic pathways for a given compound.</p> <p>Results</p> <p>In our method, we efficiently address the complexity problem by coding substrates, products and reactions into molecular signatures. Metabolic maps are represented using hypergraphs and the complexity is controlled by varying the specificity of the molecular signature. Furthermore, our method enables candidate pathways to be ranked to determine which ones are best to engineer. The proposed ranking function can integrate data from different sources such as host compatibility for inserted genes, the estimation of steady-state fluxes from the genome-wide reconstruction of the organism's metabolism, or the estimation of metabolite toxicity from experimental assays. We use several machine-learning tools in order to estimate enzyme activity and reaction efficiency at each step of the identified pathways. Examples of production in bacteria and yeast for two antibiotics and for one antitumor agent, as well as for several essential metabolites are outlined.</p> <p>Conclusions</p> <p>We present here a unified framework that integrates diverse techniques involved in the design of heterologous biosynthetic pathways through a retrosynthetic approach in the reaction signature space. Our engineering methodology enables the flexible design of industrial microorganisms for the efficient on-demand production of chemical compounds with therapeutic applications.</p
Evaluating pathway enumeration algorithms in metabolic engineering case studies
The design of cell factories for the production of compounds involves the search for suitable heterologous pathways. Different strategies have been proposed to infer such pathways, but most are optimization approaches with specific objective functions, not suited to enumerate multiple pathways. In this work, we analyze two pathway enumeration algorithms based on graph representations: the Solution Structure Generation and the Find Path algorithms. Both are capable of enumerating exhaustively multiple pathways using network topology. We study their capabilities and limitations when designing novel heterologous pathways, by applying these methods on two case studies of synthetic metabolic engineering related to the production of butanol and vanillin
Identification of metabolic pathways using pathfinding approaches: A systematic review
Metabolic pathways have become increasingly available for variousmicroorganisms. Such pathways have spurred the development of a wide array of computational tools, in particular, mathematical pathfinding approaches. This article can facilitate the understanding of computational analysis ofmetabolic pathways in genomics. Moreover, stoichiometric and pathfinding approaches inmetabolic pathway analysis are discussed. Threemajor types of studies are elaborated: stoichiometric identification models, pathway-based graph analysis and pathfinding approaches in cellular metabolism. Furthermore, evaluation of the outcomes of the pathways withmathematical benchmarkingmetrics is provided. This review would lead to better comprehension ofmetabolismbehaviors in living cells, in terms of computed pathfinding approaches. © The Author 2016
Discovery and implementation of a novel pathway for n-butanol production via 2-oxoglutarate
Background
One of the European Union directives indicates that 10% of all fuels must be bio-synthesized by 2020. In this regard, biobutanolnatively produced by clostridial strainsposes as a promising alternative biofuel. One possible approach to overcome the difficulties of the industrial exploration of the native producers is the expression of more suitable pathways in robust microorganisms such as Escherichia coli. The enumeration of novel pathways is a powerful tool, allowing to identify non-obvious combinations of enzymes to produce a target compound.
Results
This work describes the in silico driven design of E. coli strains able to produce butanol via 2-oxoglutarate by a novel pathway. This butanol pathway was generated by a hypergraph algorithm and selected from an initial set of 105,954 different routes by successively applying different filters, such as stoichiometric feasibility, size and novelty. The implementation of this pathway involved seven catalytic steps and required the insertion of nine heterologous genes from various sources in E. coli distributed in three plasmids. Expressing butanol genes in E. coli K12 and cultivation in High-Density Medium formulation seem to favor butanol accumulation via the 2-oxoglutarate pathway. The maximum butanol titer obtained was 85±1 mg L1 by cultivating the cells in bioreactors.
Conclusions
In this work, we were able to successfully translate the computational analysis into in vivo applications, designing novel strains of E. coli able to produce n-butanol via an innovative pathway. Our results demonstrate that enumeration algorithms can broad the spectrum of butanol producing pathways. This validation encourages further research to other target compounds.This study was supported by the Portuguese Foundation for Science and Technology (FCT) under the scope of a Ph.D. Grant (PD/BD/52366/2013) from MIT Portugal Program and the strategic funding of UID/BIO/04469 unit. Additional support was received by COMPETE 2020 (POCI-01-0145-FEDER-006684) and BioTecNorte operation (NORTE-01-0145-FEDER-000004) funded by the European Regional Development Fund under the scope of Norte2020-Programa Operacional Regional do Norte.
The authors also thank the Times New Roman project “Dynamics”, Ref. ERA-IB-2/0002/2014, funded by national funds through FCT/MCTES.
The genes thl, hbd, crt and adhE1 were kindly provided by Kristala L. Jones Prather from MIT.
The authors thank the project DDDeCaF - Bioinformatics Services for Data-Driven Design of Cell Factories and Communities, Ref. H2020-LEIT-BIO-2015-1 686070–1, funded by the European Commission and the Project LISBOA010145 FEDER007660 (Microbiologia Molecular, Estrutural e Celular) funded by FEDER funds through COMPETE2020 Programa Operacional Competitividade e Internacionalização (POCI) and by national funds through FCT Fundacao para a Ciencia e a Tecnologiainfo:eu-repo/semantics/publishedVersio
Discovery and implementation of a novel pathway for n-butanol production via 2-oxoglutarate
Background: One of the European Union directives indicates that 10% of all fuels must be bio-synthesized by 2020. In this regard, biobutanol - natively produced by clostridial strains - poses as a promising alternative biofuel. One possible approach to overcome the difficulties of the industrial exploration of the native producers is the expression of more suitable pathways in robust microorganisms such as Escherichia coli. The enumeration of novel pathways is a powerful tool, allowing to identify non-obvious combinations of enzymes to produce a target compound. Results: This work describes the in silico driven design of E. coli strains able to produce butanol via 2-oxoglutarate by a novel pathway. This butanol pathway was generated by a hypergraph algorithm and selected from an initial set of 105,954 different routes by successively applying different filters, such as stoichiometric feasibility, size and novelty. The implementation of this pathway involved seven catalytic steps and required the insertion of nine heterologous genes from various sources in E. coli distributed in three plasmids. Expressing butanol genes in E. coli K12 and cultivation in High-Density Medium formulation seem to favor butanol accumulation via the 2-oxoglutarate pathway. The maximum butanol titer obtained was 85 \ub1 1 mg L-1 by cultivating the cells in bioreactors. Conclusions: In this work, we were able to successfully translate the computational analysis into in vivo applications, designing novel strains of E. coli able to produce n-butanol via an innovative pathway. Our results demonstrate that enumeration algorithms can broad the spectrum of butanol producing pathways. This validation encourages further research to other target compounds
Computational Studies on Cellular Metabolism:From Biochemical Pathways to Complex Metabolic Networks
Biotechnology promises the biologically and ecologically sustainable production of commodity chemicals, biofuels, pharmaceuticals and other high-value products using industrial platform microorganisms. Metabolic engineering plays a key role in this process, providing the tools for targeted modifications of microbial metabolism to create efficient microbial cell factories that convert low value substrates to value-added chemicals. Engineering microbes for the bioproduction of chemicals has been practiced through three different approaches: (i) optimization of native pathways of a host organism; (ii) incorporation of heterologous pathways in an amenable organism; and finally (iii) design and introduction of synthetic pathways in an organism. So far, the progress that has been made in the biosynthesis of chemicals was mostly achieved using the first two approaches. Nevertheless, many novel biosynthetic pathways for the production of native and non-native compounds that have potential to provide near-theoretical yields and high specific production rates of chemicals remain yet to be discovered. Therefore, the third approach is crucial for the advancement of bio-based production of value-added chemicals. We need to fully comprehend and analyze the existing knowledge of metabolism in order to generate new hypotheses and design de novo pathways. In this thesis, through development and application of efficient computational methods, we took the research path to expand our understanding of cell metabolism with the aim to discover novel knowledge about metabolic networks. We analyze different aspects of metabolism through five distinct studies. In the first study, we begin with a holistic view of the enzymatic reactions across all the species, and we propose a computational approach for identifying all the theoretically possible enzymatic reactions based on the known biochemistry. We organize our results in a web-based database called âAtlas of biochemistryâ. In the second study, we focus on one of the most structurally diverse and ubiquitous constituents of metabolism, the lipid metabolism. Here we propose a computational framework for integrating lipid species with unknown metabolic/catabolic pathways into metabolic networks. In our next study, we investigate the full metabolic capacity of E. coli. We explore computationally all enzymatic potentials of this organism, and we introduce the âSuper E. coliâ, a new and advanced chassis for metabolic engineering studies. Our next contribution concentrates on the development of a new method for the atom-level description of metabolic networks. We demonstrate the significance of our approach through the reconstruction of atom-level map of the E. coli central metabolism. In the last study, we turn our focus on studying the thermodynamics of metabolism and we present our original approach for estimating the thermodynamic properties of an important class of metabolites. So far, the available thermodynamic properties either from experiments or the computational methods are estimated with respect to the standard conditions, which are different from typical biological conditions. Our workflow paves the way for reliable computing of thermochemical properties of biomolecules at biological conditions of temperature and pressure. Finally, in the conclusion chapter, we discuss the outlook of this work and the potential further applications of the computational methods that were developed in this thesis
Low potency toxins reveal dense interaction networks in metabolism
Background
The chemicals of metabolism are constructed of a small set of atoms and bonds. This may be because chemical structures outside the chemical space in which life operates are incompatible with biochemistry, or because mechanisms to make or utilize such excluded structures has not evolved. In this paper I address the extent to which biochemistry is restricted to a small fraction of the chemical space of possible chemicals, a restricted subset that I call Biochemical Space. I explore evidence that this restriction is at least in part due to selection again specific structures, and suggest a mechanism by which this occurs.
Results
Chemicals that contain structures that our outside Biochemical Space (UnBiological groups) are more likely to be toxic to a wide range of organisms, even though they have no specifically toxic groups and no obvious mechanism of toxicity. This correlation of UnBiological with toxicity is stronger for low potency (millimolar) toxins. I relate this to the observation that most chemicals interact with many biological structures at low millimolar toxicity. I hypothesise that life has to select its components not only to have a specific set of functions but also to avoid interactions with all the other components of life that might degrade their function.
Conclusions
The chemistry of life has to form a dense, self-consistent network of chemical structures, and cannot easily be arbitrarily extended. The toxicity of arbitrary chemicals is a reflection of the disruption to that network occasioned by trying to insert a chemical into it without also selecting all the other components to tolerate that chemical. This suggests new ways to test for the toxicity of chemicals, and that engineering organisms to make high concentrations of materials such as chemical precursors or fuels may require more substantial engineering than just of the synthetic pathways involved
Application of machine learning in systems biology
Biological systems are composed of a large number of molecular components. Understanding their behavior as a result of the interactions between the individual components is one of the aims of systems biology. Computational modelling is a powerful tool commonly used in systems biology, which relies on mathematical models that capture the properties and interactions between molecular components to simulate the behavior of the whole system. However, in many biological systems, it becomes challenging to build reliable mathematical models due to the complexity and the poor understanding of the underlying mechanisms. With the breakthrough in big data technologies in biology, data-driven machine learning (ML) approaches offer a promising complement to traditional theory-based models in systems biology. Firstly, ML can be used to model the systems in which the relationships between the components and the system are too complex to be modelled with theory-based models. Two such examples of using ML to resolve the genotype-phenotype relationships are presented in this thesis: (i) predicting yeast phenotypes using genomic features and (ii) predicting the thermal niche of microorganisms based on the proteome features. Secondly, ML naturally complements theory-based models. By applying ML, I improved the performance of the genome-scale metabolic model in describing yeast thermotolerance. In this application, ML was used to estimate the thermal parameters by using a Bayesian statistical learning approach that trains regression models and performs uncertainty quantification and reduction. The predicted bottleneck genes were further validated by experiments in improving yeast thermotolerance. In such applications, regression models are frequently used, and their performance relies on many factors, including but not limited to feature engineering and quality of response values. Manually engineering sufficient relevant features is particularly challenging in biology due to the lack of knowledge in certain areas. With the increasing volume of big data, deep-transfer learning enables us to learn a statistical summary of the samples from a big dataset which can be used as input to train other ML models. In the present thesis, I applied this approach to first learn a deep representation of enzyme thermal adaptation and then use it for the development of regression models for predicting enzyme optimal and protein melting temperatures. It was demonstrated that the transfer learning-based regression models outperform the classical ones trained on rationally engineered features in both cases. On the other hand, noisy response values are very common in biological datasets due to the variation in experimental measurements and they fundamentally restrict the performance attainable with regression models. I thereby addressed this challenge by deriving a theoretical upper bound for the coefficient of determination (R2) for regression models. This theoretical upper bound depends on the noise associated with the response variable and variance for a given dataset. It can thus be used to test whether the maximal performance has been reached on a particular dataset, or whether further model improvement is possible
Evaluation and development of algorithms and computational tools for metabolic pathway optimization
Programa de Doutoramento em Informática (MAP-i)Metabolic engineering exploits microorganisms to build cell factories, allowing to produce
valuable compounds from their enzymatic machinery. It involves the selection of an organism,
along with a set of genetic modifications to optimize the process. Information
regarding biological mechanisms are scattered among the literature. Metabolic databases
provide a centralized platform compiling existing biological data to build a catalog of all
known enzymatic transformations across all domains of life.
The development of genome-scale metabolic models allows to expose all possible biochemical
transformations that an organism can offer. Computer algorithms use these models
to exploit the capabilities and limitations of the organisms. Constraint-based modeling
approaches allow to predict phenotype given modifications in the network. In recent years,
there has been a significant increase in the number of available models, and for certain
organisms several models were built. The accuracy of these methods is in many cases
dependent on the quality of these models, that is limited to the available information in
the literature (or databases).
This thesis improves the existing methods by developing better data management
strategies for the metabolic modeling community. Metabolic databases are usually the
input data for many modeling tools, and the quality of solutions depends on the quality
of the databases. Currently, several metabolic databases exist, most of them sharing a
common set of information, and there is a need for a centralized system to take the most
advantage of their content. However, each database adopts its own naming system to
catalog its instances, being in many cases, diffcult to compare with others. An integration pipeline is here designed to fuse metabolic databases into a common
namespace allowing better analysis of the entire metabolic catalog across several databases,
and exploring different methods to reconcile the metabolites and reactions included in these
databases.
In a second part of this work, the Systems Biology Markup Language which is the most
common medium to store and represent genome-scale metabolic models is analyzed. Like
databases, models also adopt unique nomenclatures for reactions and compounds. Here,
methods to annotate metabolites and reactions in models are developed allowing to connect
models with database instances, thus allowing to adopt a single naming system for their
entities. The purpose of the methods is to standardize the entire model, therefore, other
entities such as, genes, compartments, simulation media, are also considered to unify these
models. The standardization methods were implemented in the KBase platform, which
allows to improve the compatibility of this system with models built from external tools.
In the last part of this thesis, the pathway enumeration problem is revisited. Synthetic
biology explores cellular modi cations to produce valuable products by inserting enzymatic
capabilities of other organisms. The selection of suitable set of genes is highly combinatorial,
since in many cases there are several alternatives to reach the target product. A
common limitation of most of the existing methods is the inability to fully explore this
combinatorial space. In this work, the (hyper)graph methods are analyzed and improved
to fully enumerate biological pathways. As result, two existing algorithms were improved
regarding to scalability, allowing to fully enumerate larger solution sets.Um dos objetivos da Engenharia Metabólica é a síntese de compostos de valor acrescentado
através de microrganismos. Uma das etapas deste processo envolve a seleção de organismos
em combinação com alterações genéticas que permitem otimizar este processo. As bases de
dados metabólicas centralizam os dados biológicos disponibilizando um catalogo de todo o
conhecimento existente relacionado ao contexto enzimático.
A reconstrução de modelos metabólicos à escala genómica permite estudar os processos
metabólicos dos diversos organismos. Com o recurso a métodos computacionais, estes
modelos permitem expor as capacidades e limitações dos diversos organismos. Abordagens
como a modelação baseada em restrições permitem prever fenótipos dadas alterações nas
vias metabólicas. Nas últimas décadas, houve um aumento significativo do número de
modelos publicados, e para alguns organismos existem várias versões disponíveis. A capacidade
de previsão destes modelos está dependente da informação disponível nas bases
de dados e na literatura.
Esta tese visa melhorar os métodos anteriores abordando questões relacionadas com a
integração de dados. As bases de dados metabólicas são geralmente a principal fonte de
informação para os métodos existentes, implicando diretamente na capacidade de resolução
destes problemas. Atualmente, existem várias bases de dados biológicas, havendo uma
necessidade de desenvolver sistemas centralizados. No entanto, é comum estes adotaram
identificares próprios, não sendo possível executar uma comparação direta. Neste trabalho,
foram desenvolvidas estratégias para reconciliar bases de dados no contexto metabólico,
permitindo integrar compostos e reações.
Na segunda parte deste trabalho, este processo de integração foi expandido para incluir
modelos metabólicos à escala genómica. De forma semelhante às bases de dados, os
modelos adotam também identificadores próprios para representar compostos e reações.
Para unificar modelos, foram desenvolvidos métodos de anotação que permitem relacionar
as instâncias dos modelos com as bases de dados. Foram, também, implementadas estratégias para identificar genes, compartimentos e as restrições da simulação. Neste trabalho, os métodos forma implementados na plataforma KBase, permitindo melhorar a
compatibilidade do sistema com os modelos externos.
Por fim, vários métodos de enumeração de vias metabólicas foram abordados. A biologia
sintética visa manipular o metabolismo celular para produção de compostos através da
inserção de genes. A seleção destes genes é um problema combinatório, que, dado um
composto alvo, identifica vários conjuntos de genes capazes de concretizar a via sintética.
Neste trabalho, pretende-se melhorar a capacidade de enumerar todas as vias possíveis,
dado um conjunto limitado de reações e o tamanho das vias. Como resultado, foram
melhorados dois métodos existentes baseados em hipergrafos, melhorando a escalabilidade
destes métodos permitindo enumerar problemas ou vias de maior dimensão.Fundação para a Ciência e Tecnologia (FCT) - PhD grant SFRH/BD/111490/201