894 research outputs found

    Development and application of efficient pathway enumeration algorithms for metabolic engineering applications

    Get PDF
    Metabolic Engineering (ME) aims to design microbial cell factories towards the production of valuable compounds. In this endeavor, one important task relates to the search for the most suitable heterologous pathway(s) to add to the selected host. Different algorithms have been developed in the past towards this goal, following distinct approaches spanning constraint-based modelling, graph-based methods and knowledge-based systems based on chemical rules. While some of these methods search for pathways optimizing specific objective functions, here the focus will be on methods that address the enumeration of pathways that are able to convert a set of source compounds into desired targets and their posterior evaluation according to different criteria. Two pathway enumeration algorithms based on (hyper)graph-based representations are selected as the most promising ones and are analyzed in more detail: the Solution Structure Generation and the Find Path algorithms. Their capabilities and limitations are evaluated when designing novel heterologous pathways, by applying these methods on three case studies of synthetic ME related to the production of non-native compounds in E. coli and S. cerevisiae: 1-butanol, curcumin and vanillin. Some targeted improvements are implemented, extending both methods to address limitations identified that impair their scalability, improving their ability to extract potential pathways over large-scale databases. In all case-studies, the algorithms were able to find already described pathways for the production of the target compounds, but also alternative pathways that can represent novel ME solutions after further evaluation.The work is partially funded by ERDF - European Regional Development Fund through the COMPETE Programme (operational programme for competitiveness) and by National Funds through the FCT (Portuguese Foundation for Science and Technology) within projects ref. COMPETE FCOMP-01-0124-FEDER-015079 and Strategic Project PEst-OE/EQB/LA0023/2013, and also by Project 23060, PEM - Technological Support Platform for Metabolic Engineering, co-funded by FEDER through Portuguese QREN under the scope of the Technological Research and Development Incentive system, North Operational

    Revising lipid chemical structures in genome-wide metabolic models with BOIMMG

    Get PDF
    An important step in the reconstruction of Genome-Scale Metabolic (GSM) models is the integration of biochemical data. Such information is often incomplete or generic, lacking in completely defined chemical structures for several molecules, including lipids. The inumerous combinations of fatty acids in the side chains of lipids, hinder their storage in databases and integration into GSM models. Generic representations are commonly used to circumvent such limitation. However, lipid specificity is likely lost, and data integration problems arise, as several models contain lipids with completely defined structures and others with their generic versions. Such clash of versions is addressed by the Biochemical cOmplex data Integration in Metabolic Models at Genome-scale (BOIMMG). BOIMMG is an open-source framework that accelerates the swapping of different molecular versions (mainly lipids, structurally defined or not) in GSM models. Upon integration into a Neo4j graph database (http://neo4j.com/), lipid-specific data from LIPID MAPS Structure Database (LMSD), Swiss Lipids (SLM) and Model SEED were processed for biosynthetic contextualization within the curated pathways of MetaCyc. Several algorithms were developed to integrate this information in GSM models, afterwards. Over 30 generic reactions were fully and 27 partially expanded, resulting in 557392 new reactions, in which 557252 were not integrated, nor listed in Model SEED. These reactions were inferred from the previously contextualized biosynthetic relationships between structurally defined compounds. BOIMMGs information was applied to GSM models, tackling the conflict of molecules versions. The whole glycerolipids and phospholipids metabolic network within E. coli iJR904 model was expanded by our approach. The comparison between the altered model and one of its manually expanded published iterations (iAF1260b), has shown that 53 and 38 more matching lipids and reactions, respectively, were found. Besides the new biochemical set, BioISO's analysis demonstrated that biomass lipids were correctly produced, corroborating the correct expansion of the whole biosynthetic network. In conclusion, BOIMMG (available athttps://boimmg.bio.di.uminho.pt/) can establish relevant relationships between complex macromolecules, within their biosynthetic context, and provide automated procedures for their integration into GSM models.info:eu-repo/semantics/publishedVersio

    Discovery and implementation of a novel pathway for n-butanol production via 2-oxoglutarate

    Get PDF
    Background: One of the European Union directives indicates that 10% of all fuels must be bio-synthesized by 2020. In this regard, biobutanol - natively produced by clostridial strains - poses as a promising alternative biofuel. One possible approach to overcome the difficulties of the industrial exploration of the native producers is the expression of more suitable pathways in robust microorganisms such as Escherichia coli. The enumeration of novel pathways is a powerful tool, allowing to identify non-obvious combinations of enzymes to produce a target compound. Results: This work describes the in silico driven design of E. coli strains able to produce butanol via 2-oxoglutarate by a novel pathway. This butanol pathway was generated by a hypergraph algorithm and selected from an initial set of 105,954 different routes by successively applying different filters, such as stoichiometric feasibility, size and novelty. The implementation of this pathway involved seven catalytic steps and required the insertion of nine heterologous genes from various sources in E. coli distributed in three plasmids. Expressing butanol genes in E. coli K12 and cultivation in High-Density Medium formulation seem to favor butanol accumulation via the 2-oxoglutarate pathway. The maximum butanol titer obtained was 85 \ub1 1 mg L-1 by cultivating the cells in bioreactors. Conclusions: In this work, we were able to successfully translate the computational analysis into in vivo applications, designing novel strains of E. coli able to produce n-butanol via an innovative pathway. Our results demonstrate that enumeration algorithms can broad the spectrum of butanol producing pathways. This validation encourages further research to other target compounds

    Prediction of protein subunits using KEGG BRITE

    Get PDF
    The increased importance of genome-scale metabolic models (GSMMs) within systems biology and metabolic engineering, led to the development of several computational frameworks dedicated to their reconstruction. One of the toughest challenges, when reconstructing a model is associated to the identification of gene-protein-reaction (GPR) associations, a step usually performed by manually searching literature. In this work, we present a new approach for automatically predicting, at the genome level, protein subunits using the KEGG BRITE database. This database contains information on hundreds of protein complexes, which can be automatically retrieved using the KEGG representational state transfer (REST) application programing interface (API). Afterwards, the gene association rule related to each protein complex is individually processed by running it through a grammar specially developed to parse these data. The parsed rule is then fitted to the genome annotation, to determine if the complex is encoded in the case study genome. Finally, the GP rule can be integrated into a metabolic model to formulate a GPR association. This methodology is implemented and can be automatically performed in merlin, a user-friendly Java application that performs the reconstruction of genome-scale metabolic models previously developed by the authors

    Computational tools for pathway optimization towards metabolic engineering applications

    Get PDF
    Dissertação de mestrado em Engenharia InformáticaMetabolic Engineering targets the microorganism's cellular metabolism to design new strains with an industrial purpose. Applications of these metabolic manipulations in Biotechnological derive from the need of enhanced production of valuable compounds. The development of in silico metabolic models proposes a quantifiable approach for the manipulation these microorganisms. In this context, constraint based modelling is one of the major approaches to predict cellular behaviour. It allows to prune the feasible space of possibilities describing possible phenotype outcomes in terms of metabolic fluxes. Under these conditions, cellular metabolism can be represented as an algebraic system constrained by the laws of mass balance and thermodynamics. These systems are prone to be represented as networks, taking advantage of different graph-based paradigms, including bipartite graphs, hypergraphs and process graphs. This thesis explores these representations and underlying algorithms for metabolic network topological analysis. The main aim will be to identify potential pathways towards the optimized biochemical production of selected compounds. Related to this task, algorithms will also be designed aiming to complement networks of specific organisms, taking as input larger metabolic databases, inserting new reactions making them able to produce a new compound of interest. To address these problems, and also related tasks of data pre-processing and evaluation of the solutions, a complete computational framework was developed. It integrates a number of previously proposed algorithms from distinct authors, together with a number of improvements that were necessary to cope with large-scale metabolic networks. These are the result of problems identi ed in the previous algorithms regarding their scalability. A case study in synthetic metabolic engineering was selected from the literature to validate the algorithms and test the capabilities of the implemented framework. It allowed to compare the performance of the implemented algorithms and validate the proposed improvements.A Engenharia Metabólica visa a alteração do metabolismo celular dos microorganismos com vista ao desenho de novas estirpes com fins industriais. As aplicações destas modificações genéticas na Biotecnologia derivam da necessidade de produzir de forma otimizada compostos de alto valor. O desenvolvimento de modelos computacionais propõe uma abordagem quantitativa para a manipulação destes organismos. Neste contexto, a modelação baseada em restrições constitui uma das abordagens mais usadas para a previsão do comportamento celular. Esta permite reduzir o espaço de soluções viáveis descrevendo o fenótipo celular a partir dos fluxos metabólicos. Nestas condições, o metabolismo celular pode ser representado como um sistema algébrico restringido pelas leis da conservação de massa e termodinâmica. Estes sistemas podem ser representados como redes, tirando partido de diferentes paradigmas baseados em grafos, incluindo os grafos bipartidos, os hipergrafos e os grafos de processos. Esta tese explora estas representações e os respetivos algoritmos para a análise topológica de redes metabólicas. O objetivo principal será o de identificar potenciais vias metabólicas para a optimização da produção de compostos selecionados. Relacionado com esta tarefa, serão desenhados algoritmos com o objetivo de complementar redes de organismos específicos, tomando como entradas bases de dados metabólicas de maior dimensão, inserindo novas reações de forma a torná-los capazes da produção de novos compostos de interesse. Para abordar estes problemas, bem como tarefas relacionadas ao nível do pré-processamento e avaliação das soluções, foi desenvolvida uma plataforma computacional completa. Esta integra um conjunto de algoritmos previamente propostos por diversos autores, em conjunto com melhorias significativas que foram necessárias para que estes pudessem lidar com redes metabólicas de grande escala. Estas melhorias resultam da identificação de problemas nos algoritmos no que diz respeito à sua escalabilidade. Um caso de estudo na Engenharia Metabólica sintética foi selecionado da literatura para validar os algoritmos e testar as capacidades da plataforma implementada. Este permitiu comparar o desempenho dos algoritmos implementados e validar as melhorias propostas

    Evaluation and development of algorithms and computational tools for metabolic pathway optimization

    Get PDF
    Programa de Doutoramento em Informática (MAP-i)Metabolic engineering exploits microorganisms to build cell factories, allowing to produce valuable compounds from their enzymatic machinery. It involves the selection of an organism, along with a set of genetic modifications to optimize the process. Information regarding biological mechanisms are scattered among the literature. Metabolic databases provide a centralized platform compiling existing biological data to build a catalog of all known enzymatic transformations across all domains of life. The development of genome-scale metabolic models allows to expose all possible biochemical transformations that an organism can offer. Computer algorithms use these models to exploit the capabilities and limitations of the organisms. Constraint-based modeling approaches allow to predict phenotype given modifications in the network. In recent years, there has been a significant increase in the number of available models, and for certain organisms several models were built. The accuracy of these methods is in many cases dependent on the quality of these models, that is limited to the available information in the literature (or databases). This thesis improves the existing methods by developing better data management strategies for the metabolic modeling community. Metabolic databases are usually the input data for many modeling tools, and the quality of solutions depends on the quality of the databases. Currently, several metabolic databases exist, most of them sharing a common set of information, and there is a need for a centralized system to take the most advantage of their content. However, each database adopts its own naming system to catalog its instances, being in many cases, diffcult to compare with others. An integration pipeline is here designed to fuse metabolic databases into a common namespace allowing better analysis of the entire metabolic catalog across several databases, and exploring different methods to reconcile the metabolites and reactions included in these databases. In a second part of this work, the Systems Biology Markup Language which is the most common medium to store and represent genome-scale metabolic models is analyzed. Like databases, models also adopt unique nomenclatures for reactions and compounds. Here, methods to annotate metabolites and reactions in models are developed allowing to connect models with database instances, thus allowing to adopt a single naming system for their entities. The purpose of the methods is to standardize the entire model, therefore, other entities such as, genes, compartments, simulation media, are also considered to unify these models. The standardization methods were implemented in the KBase platform, which allows to improve the compatibility of this system with models built from external tools. In the last part of this thesis, the pathway enumeration problem is revisited. Synthetic biology explores cellular modi cations to produce valuable products by inserting enzymatic capabilities of other organisms. The selection of suitable set of genes is highly combinatorial, since in many cases there are several alternatives to reach the target product. A common limitation of most of the existing methods is the inability to fully explore this combinatorial space. In this work, the (hyper)graph methods are analyzed and improved to fully enumerate biological pathways. As result, two existing algorithms were improved regarding to scalability, allowing to fully enumerate larger solution sets.Um dos objetivos da Engenharia Metabólica é a síntese de compostos de valor acrescentado através de microrganismos. Uma das etapas deste processo envolve a seleção de organismos em combinação com alterações genéticas que permitem otimizar este processo. As bases de dados metabólicas centralizam os dados biológicos disponibilizando um catalogo de todo o conhecimento existente relacionado ao contexto enzimático. A reconstrução de modelos metabólicos à escala genómica permite estudar os processos metabólicos dos diversos organismos. Com o recurso a métodos computacionais, estes modelos permitem expor as capacidades e limitações dos diversos organismos. Abordagens como a modelação baseada em restrições permitem prever fenótipos dadas alterações nas vias metabólicas. Nas últimas décadas, houve um aumento significativo do número de modelos publicados, e para alguns organismos existem várias versões disponíveis. A capacidade de previsão destes modelos está dependente da informação disponível nas bases de dados e na literatura. Esta tese visa melhorar os métodos anteriores abordando questões relacionadas com a integração de dados. As bases de dados metabólicas são geralmente a principal fonte de informação para os métodos existentes, implicando diretamente na capacidade de resolução destes problemas. Atualmente, existem várias bases de dados biológicas, havendo uma necessidade de desenvolver sistemas centralizados. No entanto, é comum estes adotaram identificares próprios, não sendo possível executar uma comparação direta. Neste trabalho, foram desenvolvidas estratégias para reconciliar bases de dados no contexto metabólico, permitindo integrar compostos e reações. Na segunda parte deste trabalho, este processo de integração foi expandido para incluir modelos metabólicos à escala genómica. De forma semelhante às bases de dados, os modelos adotam também identificadores próprios para representar compostos e reações. Para unificar modelos, foram desenvolvidos métodos de anotação que permitem relacionar as instâncias dos modelos com as bases de dados. Foram, também, implementadas estratégias para identificar genes, compartimentos e as restrições da simulação. Neste trabalho, os métodos forma implementados na plataforma KBase, permitindo melhorar a compatibilidade do sistema com os modelos externos. Por fim, vários métodos de enumeração de vias metabólicas foram abordados. A biologia sintética visa manipular o metabolismo celular para produção de compostos através da inserção de genes. A seleção destes genes é um problema combinatório, que, dado um composto alvo, identifica vários conjuntos de genes capazes de concretizar a via sintética. Neste trabalho, pretende-se melhorar a capacidade de enumerar todas as vias possíveis, dado um conjunto limitado de reações e o tamanho das vias. Como resultado, foram melhorados dois métodos existentes baseados em hipergrafos, melhorando a escalabilidade destes métodos permitindo enumerar problemas ou vias de maior dimensão.Fundação para a Ciência e Tecnologia (FCT) - PhD grant SFRH/BD/111490/201

    On the Impact of Frequency Variation on Nonlinearity Mitigation using Frequency Combs

    Get PDF
    We investigated the impact of linewidth and dithering-induced frequency variation on the performance of nonlinearity mitigation using frequency combs. Compared to independent laser arrays, >2dB SNR gain can be achieved using comb sources

    Discovery and implementation of a novel pathway for n-butanol production via 2-oxoglutarate

    Get PDF
    Background One of the European Union directives indicates that 10% of all fuels must be bio-synthesized by 2020. In this regard, biobutanolnatively produced by clostridial strainsposes as a promising alternative biofuel. One possible approach to overcome the difficulties of the industrial exploration of the native producers is the expression of more suitable pathways in robust microorganisms such as Escherichia coli. The enumeration of novel pathways is a powerful tool, allowing to identify non-obvious combinations of enzymes to produce a target compound. Results This work describes the in silico driven design of E. coli strains able to produce butanol via 2-oxoglutarate by a novel pathway. This butanol pathway was generated by a hypergraph algorithm and selected from an initial set of 105,954 different routes by successively applying different filters, such as stoichiometric feasibility, size and novelty. The implementation of this pathway involved seven catalytic steps and required the insertion of nine heterologous genes from various sources in E. coli distributed in three plasmids. Expressing butanol genes in E. coli K12 and cultivation in High-Density Medium formulation seem to favor butanol accumulation via the 2-oxoglutarate pathway. The maximum butanol titer obtained was 85±1 mg L1 by cultivating the cells in bioreactors. Conclusions In this work, we were able to successfully translate the computational analysis into in vivo applications, designing novel strains of E. coli able to produce n-butanol via an innovative pathway. Our results demonstrate that enumeration algorithms can broad the spectrum of butanol producing pathways. This validation encourages further research to other target compounds.This study was supported by the Portuguese Foundation for Science and Technology (FCT) under the scope of a Ph.D. Grant (PD/BD/52366/2013) from MIT Portugal Program and the strategic funding of UID/BIO/04469 unit. Additional support was received by COMPETE 2020 (POCI-01-0145-FEDER-006684) and BioTecNorte operation (NORTE-01-0145-FEDER-000004) funded by the European Regional Development Fund under the scope of Norte2020-Programa Operacional Regional do Norte. The authors also thank the Times New Roman project “Dynamics”, Ref. ERA-IB-2/0002/2014, funded by national funds through FCT/MCTES. The genes thl, hbd, crt and adhE1 were kindly provided by Kristala L. Jones Prather from MIT. The authors thank the project DDDeCaF - Bioinformatics Services for Data-Driven Design of Cell Factories and Communities, Ref. H2020-LEIT-BIO-2015-1 686070–1, funded by the European Commission and the Project LISBOA010145 FEDER007660 (Microbiologia Molecular, Estrutural e Celular) funded by FEDER funds through COMPETE2020 Programa Operacional Competitividade e Internacionalização (POCI) and by national funds through FCT Fundacao para a Ciencia e a Tecnologiainfo:eu-repo/semantics/publishedVersio
    corecore