17 research outputs found

    A retrosynthetic biology approach to metabolic pathway design for therapeutic production

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Synthetic biology is used to develop cell factories for production of chemicals by constructively importing heterologous pathways into industrial microorganisms. In this work we present a retrosynthetic approach to the production of therapeutics with the goal of developing an <it>in situ </it>drug delivery device in host cells. Retrosynthesis, a concept originally proposed for synthetic chemistry, iteratively applies reversed chemical transformations (reversed enzyme-catalyzed reactions in the metabolic space) starting from a target product to reach precursors that are endogenous to the chassis. So far, a wider adoption of retrosynthesis into the manufacturing pipeline has been hindered by the complexity of enumerating all feasible biosynthetic pathways for a given compound.</p> <p>Results</p> <p>In our method, we efficiently address the complexity problem by coding substrates, products and reactions into molecular signatures. Metabolic maps are represented using hypergraphs and the complexity is controlled by varying the specificity of the molecular signature. Furthermore, our method enables candidate pathways to be ranked to determine which ones are best to engineer. The proposed ranking function can integrate data from different sources such as host compatibility for inserted genes, the estimation of steady-state fluxes from the genome-wide reconstruction of the organism's metabolism, or the estimation of metabolite toxicity from experimental assays. We use several machine-learning tools in order to estimate enzyme activity and reaction efficiency at each step of the identified pathways. Examples of production in bacteria and yeast for two antibiotics and for one antitumor agent, as well as for several essential metabolites are outlined.</p> <p>Conclusions</p> <p>We present here a unified framework that integrates diverse techniques involved in the design of heterologous biosynthetic pathways through a retrosynthetic approach in the reaction signature space. Our engineering methodology enables the flexible design of industrial microorganisms for the efficient on-demand production of chemical compounds with therapeutic applications.</p

    Evaluating pathway enumeration algorithms in metabolic engineering case studies

    Get PDF
    The design of cell factories for the production of compounds involves the search for suitable heterologous pathways. Different strategies have been proposed to infer such pathways, but most are optimization approaches with specific objective functions, not suited to enumerate multiple pathways. In this work, we analyze two pathway enumeration algorithms based on graph representations: the Solution Structure Generation and the Find Path algorithms. Both are capable of enumerating exhaustively multiple pathways using network topology. We study their capabilities and limitations when designing novel heterologous pathways, by applying these methods on two case studies of synthetic metabolic engineering related to the production of butanol and vanillin

    Identification of metabolic pathways using pathfinding approaches: A systematic review

    Get PDF
    Metabolic pathways have become increasingly available for variousmicroorganisms. Such pathways have spurred the development of a wide array of computational tools, in particular, mathematical pathfinding approaches. This article can facilitate the understanding of computational analysis ofmetabolic pathways in genomics. Moreover, stoichiometric and pathfinding approaches inmetabolic pathway analysis are discussed. Threemajor types of studies are elaborated: stoichiometric identification models, pathway-based graph analysis and pathfinding approaches in cellular metabolism. Furthermore, evaluation of the outcomes of the pathways withmathematical benchmarkingmetrics is provided. This review would lead to better comprehension ofmetabolismbehaviors in living cells, in terms of computed pathfinding approaches. © The Author 2016

    Discovery and implementation of a novel pathway for n-butanol production via 2-oxoglutarate

    Get PDF
    Background One of the European Union directives indicates that 10% of all fuels must be bio-synthesized by 2020. In this regard, biobutanolnatively produced by clostridial strainsposes as a promising alternative biofuel. One possible approach to overcome the difficulties of the industrial exploration of the native producers is the expression of more suitable pathways in robust microorganisms such as Escherichia coli. The enumeration of novel pathways is a powerful tool, allowing to identify non-obvious combinations of enzymes to produce a target compound. Results This work describes the in silico driven design of E. coli strains able to produce butanol via 2-oxoglutarate by a novel pathway. This butanol pathway was generated by a hypergraph algorithm and selected from an initial set of 105,954 different routes by successively applying different filters, such as stoichiometric feasibility, size and novelty. The implementation of this pathway involved seven catalytic steps and required the insertion of nine heterologous genes from various sources in E. coli distributed in three plasmids. Expressing butanol genes in E. coli K12 and cultivation in High-Density Medium formulation seem to favor butanol accumulation via the 2-oxoglutarate pathway. The maximum butanol titer obtained was 85±1 mg L1 by cultivating the cells in bioreactors. Conclusions In this work, we were able to successfully translate the computational analysis into in vivo applications, designing novel strains of E. coli able to produce n-butanol via an innovative pathway. Our results demonstrate that enumeration algorithms can broad the spectrum of butanol producing pathways. This validation encourages further research to other target compounds.This study was supported by the Portuguese Foundation for Science and Technology (FCT) under the scope of a Ph.D. Grant (PD/BD/52366/2013) from MIT Portugal Program and the strategic funding of UID/BIO/04469 unit. Additional support was received by COMPETE 2020 (POCI-01-0145-FEDER-006684) and BioTecNorte operation (NORTE-01-0145-FEDER-000004) funded by the European Regional Development Fund under the scope of Norte2020-Programa Operacional Regional do Norte. The authors also thank the Times New Roman project “Dynamics”, Ref. ERA-IB-2/0002/2014, funded by national funds through FCT/MCTES. The genes thl, hbd, crt and adhE1 were kindly provided by Kristala L. Jones Prather from MIT. The authors thank the project DDDeCaF - Bioinformatics Services for Data-Driven Design of Cell Factories and Communities, Ref. H2020-LEIT-BIO-2015-1 686070–1, funded by the European Commission and the Project LISBOA010145 FEDER007660 (Microbiologia Molecular, Estrutural e Celular) funded by FEDER funds through COMPETE2020 Programa Operacional Competitividade e Internacionalização (POCI) and by national funds through FCT Fundacao para a Ciencia e a Tecnologiainfo:eu-repo/semantics/publishedVersio

    Discovery and implementation of a novel pathway for n-butanol production via 2-oxoglutarate

    Get PDF
    Background: One of the European Union directives indicates that 10% of all fuels must be bio-synthesized by 2020. In this regard, biobutanol - natively produced by clostridial strains - poses as a promising alternative biofuel. One possible approach to overcome the difficulties of the industrial exploration of the native producers is the expression of more suitable pathways in robust microorganisms such as Escherichia coli. The enumeration of novel pathways is a powerful tool, allowing to identify non-obvious combinations of enzymes to produce a target compound. Results: This work describes the in silico driven design of E. coli strains able to produce butanol via 2-oxoglutarate by a novel pathway. This butanol pathway was generated by a hypergraph algorithm and selected from an initial set of 105,954 different routes by successively applying different filters, such as stoichiometric feasibility, size and novelty. The implementation of this pathway involved seven catalytic steps and required the insertion of nine heterologous genes from various sources in E. coli distributed in three plasmids. Expressing butanol genes in E. coli K12 and cultivation in High-Density Medium formulation seem to favor butanol accumulation via the 2-oxoglutarate pathway. The maximum butanol titer obtained was 85 \ub1 1 mg L-1 by cultivating the cells in bioreactors. Conclusions: In this work, we were able to successfully translate the computational analysis into in vivo applications, designing novel strains of E. coli able to produce n-butanol via an innovative pathway. Our results demonstrate that enumeration algorithms can broad the spectrum of butanol producing pathways. This validation encourages further research to other target compounds

    Computational Studies on Cellular Metabolism:From Biochemical Pathways to Complex Metabolic Networks

    Get PDF
    Biotechnology promises the biologically and ecologically sustainable production of commodity chemicals, biofuels, pharmaceuticals and other high-value products using industrial platform microorganisms. Metabolic engineering plays a key role in this process, providing the tools for targeted modifications of microbial metabolism to create efficient microbial cell factories that convert low value substrates to value-added chemicals. Engineering microbes for the bioproduction of chemicals has been practiced through three different approaches: (i) optimization of native pathways of a host organism; (ii) incorporation of heterologous pathways in an amenable organism; and finally (iii) design and introduction of synthetic pathways in an organism. So far, the progress that has been made in the biosynthesis of chemicals was mostly achieved using the first two approaches. Nevertheless, many novel biosynthetic pathways for the production of native and non-native compounds that have potential to provide near-theoretical yields and high specific production rates of chemicals remain yet to be discovered. Therefore, the third approach is crucial for the advancement of bio-based production of value-added chemicals. We need to fully comprehend and analyze the existing knowledge of metabolism in order to generate new hypotheses and design de novo pathways. In this thesis, through development and application of efficient computational methods, we took the research path to expand our understanding of cell metabolism with the aim to discover novel knowledge about metabolic networks. We analyze different aspects of metabolism through five distinct studies. In the first study, we begin with a holistic view of the enzymatic reactions across all the species, and we propose a computational approach for identifying all the theoretically possible enzymatic reactions based on the known biochemistry. We organize our results in a web-based database called âAtlas of biochemistryâ. In the second study, we focus on one of the most structurally diverse and ubiquitous constituents of metabolism, the lipid metabolism. Here we propose a computational framework for integrating lipid species with unknown metabolic/catabolic pathways into metabolic networks. In our next study, we investigate the full metabolic capacity of E. coli. We explore computationally all enzymatic potentials of this organism, and we introduce the âSuper E. coliâ, a new and advanced chassis for metabolic engineering studies. Our next contribution concentrates on the development of a new method for the atom-level description of metabolic networks. We demonstrate the significance of our approach through the reconstruction of atom-level map of the E. coli central metabolism. In the last study, we turn our focus on studying the thermodynamics of metabolism and we present our original approach for estimating the thermodynamic properties of an important class of metabolites. So far, the available thermodynamic properties either from experiments or the computational methods are estimated with respect to the standard conditions, which are different from typical biological conditions. Our workflow paves the way for reliable computing of thermochemical properties of biomolecules at biological conditions of temperature and pressure. Finally, in the conclusion chapter, we discuss the outlook of this work and the potential further applications of the computational methods that were developed in this thesis

    Low potency toxins reveal dense interaction networks in metabolism

    Get PDF
    Background The chemicals of metabolism are constructed of a small set of atoms and bonds. This may be because chemical structures outside the chemical space in which life operates are incompatible with biochemistry, or because mechanisms to make or utilize such excluded structures has not evolved. In this paper I address the extent to which biochemistry is restricted to a small fraction of the chemical space of possible chemicals, a restricted subset that I call Biochemical Space. I explore evidence that this restriction is at least in part due to selection again specific structures, and suggest a mechanism by which this occurs. Results Chemicals that contain structures that our outside Biochemical Space (UnBiological groups) are more likely to be toxic to a wide range of organisms, even though they have no specifically toxic groups and no obvious mechanism of toxicity. This correlation of UnBiological with toxicity is stronger for low potency (millimolar) toxins. I relate this to the observation that most chemicals interact with many biological structures at low millimolar toxicity. I hypothesise that life has to select its components not only to have a specific set of functions but also to avoid interactions with all the other components of life that might degrade their function. Conclusions The chemistry of life has to form a dense, self-consistent network of chemical structures, and cannot easily be arbitrarily extended. The toxicity of arbitrary chemicals is a reflection of the disruption to that network occasioned by trying to insert a chemical into it without also selecting all the other components to tolerate that chemical. This suggests new ways to test for the toxicity of chemicals, and that engineering organisms to make high concentrations of materials such as chemical precursors or fuels may require more substantial engineering than just of the synthetic pathways involved

    Application of machine learning in systems biology

    Get PDF
    Biological systems are composed of a large number of molecular components. Understanding their behavior as a result of the interactions between the individual components is one of the aims of systems biology. Computational modelling is a powerful tool commonly used in systems biology, which relies on mathematical models that capture the properties and interactions between molecular components to simulate the behavior of the whole system. However, in many biological systems, it becomes challenging to build reliable mathematical models due to the complexity and the poor understanding of the underlying mechanisms. With the breakthrough in big data technologies in biology, data-driven machine learning (ML) approaches offer a promising complement to traditional theory-based models in systems biology. Firstly, ML can be used to model the systems in which the relationships between the components and the system are too complex to be modelled with theory-based models. Two such examples of using ML to resolve the genotype-phenotype relationships are presented in this thesis: (i) predicting yeast phenotypes using genomic features and (ii) predicting the thermal niche of microorganisms based on the proteome features. Secondly, ML naturally complements theory-based models. By applying ML, I improved the performance of the genome-scale metabolic model in describing yeast thermotolerance. In this application, ML was used to estimate the thermal parameters by using a Bayesian statistical learning approach that trains regression models and performs uncertainty quantification and reduction. The predicted bottleneck genes were further validated by experiments in improving yeast thermotolerance. In such applications, regression models are frequently used, and their performance relies on many factors, including but not limited to feature engineering and quality of response values. Manually engineering sufficient relevant features is particularly challenging in biology due to the lack of knowledge in certain areas. With the increasing volume of big data, deep-transfer learning enables us to learn a statistical summary of the samples from a big dataset which can be used as input to train other ML models. In the present thesis, I applied this approach to first learn a deep representation of enzyme thermal adaptation and then use it for the development of regression models for predicting enzyme optimal and protein melting temperatures. It was demonstrated that the transfer learning-based regression models outperform the classical ones trained on rationally engineered features in both cases. On the other hand, noisy response values are very common in biological datasets due to the variation in experimental measurements and they fundamentally restrict the performance attainable with regression models. I thereby addressed this challenge by deriving a theoretical upper bound for the coefficient of determination (R2) for regression models. This theoretical upper bound depends on the noise associated with the response variable and variance for a given dataset. It can thus be used to test whether the maximal performance has been reached on a particular dataset, or whether further model improvement is possible

    Evaluation and development of algorithms and computational tools for metabolic pathway optimization

    Get PDF
    Programa de Doutoramento em Informática (MAP-i)Metabolic engineering exploits microorganisms to build cell factories, allowing to produce valuable compounds from their enzymatic machinery. It involves the selection of an organism, along with a set of genetic modifications to optimize the process. Information regarding biological mechanisms are scattered among the literature. Metabolic databases provide a centralized platform compiling existing biological data to build a catalog of all known enzymatic transformations across all domains of life. The development of genome-scale metabolic models allows to expose all possible biochemical transformations that an organism can offer. Computer algorithms use these models to exploit the capabilities and limitations of the organisms. Constraint-based modeling approaches allow to predict phenotype given modifications in the network. In recent years, there has been a significant increase in the number of available models, and for certain organisms several models were built. The accuracy of these methods is in many cases dependent on the quality of these models, that is limited to the available information in the literature (or databases). This thesis improves the existing methods by developing better data management strategies for the metabolic modeling community. Metabolic databases are usually the input data for many modeling tools, and the quality of solutions depends on the quality of the databases. Currently, several metabolic databases exist, most of them sharing a common set of information, and there is a need for a centralized system to take the most advantage of their content. However, each database adopts its own naming system to catalog its instances, being in many cases, diffcult to compare with others. An integration pipeline is here designed to fuse metabolic databases into a common namespace allowing better analysis of the entire metabolic catalog across several databases, and exploring different methods to reconcile the metabolites and reactions included in these databases. In a second part of this work, the Systems Biology Markup Language which is the most common medium to store and represent genome-scale metabolic models is analyzed. Like databases, models also adopt unique nomenclatures for reactions and compounds. Here, methods to annotate metabolites and reactions in models are developed allowing to connect models with database instances, thus allowing to adopt a single naming system for their entities. The purpose of the methods is to standardize the entire model, therefore, other entities such as, genes, compartments, simulation media, are also considered to unify these models. The standardization methods were implemented in the KBase platform, which allows to improve the compatibility of this system with models built from external tools. In the last part of this thesis, the pathway enumeration problem is revisited. Synthetic biology explores cellular modi cations to produce valuable products by inserting enzymatic capabilities of other organisms. The selection of suitable set of genes is highly combinatorial, since in many cases there are several alternatives to reach the target product. A common limitation of most of the existing methods is the inability to fully explore this combinatorial space. In this work, the (hyper)graph methods are analyzed and improved to fully enumerate biological pathways. As result, two existing algorithms were improved regarding to scalability, allowing to fully enumerate larger solution sets.Um dos objetivos da Engenharia Metabólica é a síntese de compostos de valor acrescentado através de microrganismos. Uma das etapas deste processo envolve a seleção de organismos em combinação com alterações genéticas que permitem otimizar este processo. As bases de dados metabólicas centralizam os dados biológicos disponibilizando um catalogo de todo o conhecimento existente relacionado ao contexto enzimático. A reconstrução de modelos metabólicos à escala genómica permite estudar os processos metabólicos dos diversos organismos. Com o recurso a métodos computacionais, estes modelos permitem expor as capacidades e limitações dos diversos organismos. Abordagens como a modelação baseada em restrições permitem prever fenótipos dadas alterações nas vias metabólicas. Nas últimas décadas, houve um aumento significativo do número de modelos publicados, e para alguns organismos existem várias versões disponíveis. A capacidade de previsão destes modelos está dependente da informação disponível nas bases de dados e na literatura. Esta tese visa melhorar os métodos anteriores abordando questões relacionadas com a integração de dados. As bases de dados metabólicas são geralmente a principal fonte de informação para os métodos existentes, implicando diretamente na capacidade de resolução destes problemas. Atualmente, existem várias bases de dados biológicas, havendo uma necessidade de desenvolver sistemas centralizados. No entanto, é comum estes adotaram identificares próprios, não sendo possível executar uma comparação direta. Neste trabalho, foram desenvolvidas estratégias para reconciliar bases de dados no contexto metabólico, permitindo integrar compostos e reações. Na segunda parte deste trabalho, este processo de integração foi expandido para incluir modelos metabólicos à escala genómica. De forma semelhante às bases de dados, os modelos adotam também identificadores próprios para representar compostos e reações. Para unificar modelos, foram desenvolvidos métodos de anotação que permitem relacionar as instâncias dos modelos com as bases de dados. Foram, também, implementadas estratégias para identificar genes, compartimentos e as restrições da simulação. Neste trabalho, os métodos forma implementados na plataforma KBase, permitindo melhorar a compatibilidade do sistema com os modelos externos. Por fim, vários métodos de enumeração de vias metabólicas foram abordados. A biologia sintética visa manipular o metabolismo celular para produção de compostos através da inserção de genes. A seleção destes genes é um problema combinatório, que, dado um composto alvo, identifica vários conjuntos de genes capazes de concretizar a via sintética. Neste trabalho, pretende-se melhorar a capacidade de enumerar todas as vias possíveis, dado um conjunto limitado de reações e o tamanho das vias. Como resultado, foram melhorados dois métodos existentes baseados em hipergrafos, melhorando a escalabilidade destes métodos permitindo enumerar problemas ou vias de maior dimensão.Fundação para a Ciência e Tecnologia (FCT) - PhD grant SFRH/BD/111490/201
    corecore