894 research outputs found
Development and application of efficient pathway enumeration algorithms for metabolic engineering applications
Metabolic Engineering (ME) aims to design microbial cell factories towards the production of valuable compounds. In this endeavor, one important task relates to the search for the most suitable heterologous pathway(s) to add to the selected host. Different algorithms have been developed in the past towards this goal, following distinct approaches spanning constraint-based modelling, graph-based methods and knowledge-based systems based on chemical rules. While some of these methods search for pathways optimizing specific objective functions, here the focus will be on methods that address the enumeration of pathways that are able to convert a set of source compounds into desired targets and their posterior evaluation according to different criteria. Two pathway enumeration algorithms based on (hyper)graph-based representations are selected as the most promising ones and are analyzed in more detail: the Solution Structure Generation and the Find Path algorithms. Their capabilities and limitations are evaluated when designing novel heterologous pathways, by applying these methods on three case studies of synthetic ME related to the production of non-native compounds in E. coli and S. cerevisiae: 1-butanol, curcumin and vanillin. Some targeted improvements are implemented, extending both methods to address limitations identified that impair their scalability, improving their ability to extract potential pathways over large-scale databases. In all case-studies, the algorithms were able to find already described pathways for the production of the target compounds, but also alternative pathways that can represent novel ME solutions after further evaluation.The work is partially funded by ERDF - European Regional Development Fund through the COMPETE Programme (operational programme for competitiveness) and by National Funds through the FCT (Portuguese Foundation for Science and Technology) within projects ref. COMPETE FCOMP-01-0124-FEDER-015079 and Strategic Project PEst-OE/EQB/LA0023/2013, and also by Project 23060, PEM - Technological Support Platform for Metabolic Engineering, co-funded by FEDER through Portuguese QREN under the scope of the Technological Research and Development Incentive system, North Operational
Revising lipid chemical structures in genome-wide metabolic models with BOIMMG
An important step in the reconstruction of Genome-Scale Metabolic (GSM) models is the integration of biochemical data. Such information is often incomplete or generic, lacking in completely defined chemical structures for several molecules, including lipids. The inumerous combinations of fatty acids in the side chains of lipids, hinder their storage in databases and integration into GSM models. Generic representations are commonly used to circumvent such
limitation. However, lipid specificity is likely lost, and data integration problems arise, as several models contain lipids with completely defined structures and others with their generic versions.
Such clash of versions is addressed by the Biochemical cOmplex data Integration in Metabolic Models at Genome-scale (BOIMMG). BOIMMG is an open-source framework that accelerates the swapping of different molecular versions (mainly lipids, structurally defined or not) in GSM models. Upon integration into a Neo4j graph database (http://neo4j.com/), lipid-specific data from LIPID MAPS Structure Database (LMSD), Swiss Lipids (SLM) and Model SEED were processed
for biosynthetic contextualization within the curated pathways of MetaCyc. Several algorithms were developed to integrate this information in GSM models, afterwards.
Over 30 generic reactions were fully and 27 partially expanded, resulting in 557392 new reactions, in which 557252 were not integrated, nor listed in Model SEED. These reactions were inferred from the previously contextualized biosynthetic relationships between structurally defined compounds.
BOIMMGs information was applied to GSM models, tackling the conflict of molecules versions.
The whole glycerolipids and phospholipids metabolic network within E. coli iJR904 model was expanded by our approach. The comparison between the altered model and one of its manually expanded published iterations (iAF1260b), has shown that 53 and 38 more matching lipids and reactions, respectively, were found. Besides the new biochemical set, BioISO's analysis demonstrated that biomass lipids were correctly produced, corroborating the correct expansion of the whole biosynthetic network. In conclusion, BOIMMG (available athttps://boimmg.bio.di.uminho.pt/) can establish relevant relationships between complex macromolecules, within their biosynthetic context, and provide automated procedures for their integration into GSM models.info:eu-repo/semantics/publishedVersio
Discovery and implementation of a novel pathway for n-butanol production via 2-oxoglutarate
Background: One of the European Union directives indicates that 10% of all fuels must be bio-synthesized by 2020. In this regard, biobutanol - natively produced by clostridial strains - poses as a promising alternative biofuel. One possible approach to overcome the difficulties of the industrial exploration of the native producers is the expression of more suitable pathways in robust microorganisms such as Escherichia coli. The enumeration of novel pathways is a powerful tool, allowing to identify non-obvious combinations of enzymes to produce a target compound. Results: This work describes the in silico driven design of E. coli strains able to produce butanol via 2-oxoglutarate by a novel pathway. This butanol pathway was generated by a hypergraph algorithm and selected from an initial set of 105,954 different routes by successively applying different filters, such as stoichiometric feasibility, size and novelty. The implementation of this pathway involved seven catalytic steps and required the insertion of nine heterologous genes from various sources in E. coli distributed in three plasmids. Expressing butanol genes in E. coli K12 and cultivation in High-Density Medium formulation seem to favor butanol accumulation via the 2-oxoglutarate pathway. The maximum butanol titer obtained was 85 \ub1 1 mg L-1 by cultivating the cells in bioreactors. Conclusions: In this work, we were able to successfully translate the computational analysis into in vivo applications, designing novel strains of E. coli able to produce n-butanol via an innovative pathway. Our results demonstrate that enumeration algorithms can broad the spectrum of butanol producing pathways. This validation encourages further research to other target compounds
Prediction of protein subunits using KEGG BRITE
The increased importance of genome-scale metabolic models (GSMMs) within systems biology and metabolic engineering, led to the development of several computational frameworks dedicated to their reconstruction. One of the toughest challenges, when reconstructing a model is associated to the identification of gene-protein-reaction (GPR) associations, a step usually performed by manually searching literature. In this work, we present a new approach for automatically predicting, at the genome level, protein subunits using the KEGG BRITE database. This database contains information on hundreds of protein complexes, which can be automatically retrieved using the KEGG representational state transfer (REST) application programing interface (API). Afterwards, the gene association rule related to each protein complex is individually processed by running it through a grammar specially developed to parse these data. The parsed rule is then fitted to the genome annotation, to determine if the complex is encoded in the case study genome. Finally, the GP rule can be integrated into a metabolic model to formulate a GPR association. This methodology is implemented and can be automatically performed in merlin, a user-friendly Java application that performs the reconstruction of genome-scale metabolic models previously developed by the authors
Computational tools for pathway optimization towards metabolic engineering applications
Dissertação de mestrado em Engenharia InformáticaMetabolic Engineering targets the microorganism's cellular metabolism to design
new strains with an industrial purpose. Applications of these metabolic manipulations
in Biotechnological derive from the need of enhanced production of valuable
compounds. The development of in silico metabolic models proposes a quantifiable
approach for the manipulation these microorganisms. In this context, constraint
based modelling is one of the major approaches to predict cellular behaviour. It
allows to prune the feasible space of possibilities describing possible phenotype
outcomes in terms of metabolic fluxes. Under these conditions, cellular metabolism
can be represented as an algebraic system constrained by the laws of mass
balance and thermodynamics.
These systems are prone to be represented as networks, taking advantage of different
graph-based paradigms, including bipartite graphs, hypergraphs and process
graphs. This thesis explores these representations and underlying algorithms for
metabolic network topological analysis. The main aim will be to identify potential
pathways towards the optimized biochemical production of selected compounds.
Related to this task, algorithms will also be designed aiming to complement networks
of specific organisms, taking as input larger metabolic databases, inserting
new reactions making them able to produce a new compound of interest.
To address these problems, and also related tasks of data pre-processing and evaluation
of the solutions, a complete computational framework was developed. It
integrates a number of previously proposed algorithms from distinct authors, together
with a number of improvements that were necessary to cope with large-scale
metabolic networks. These are the result of problems identi ed in the previous
algorithms regarding their scalability.
A case study in synthetic metabolic engineering was selected from the literature to
validate the algorithms and test the capabilities of the implemented framework. It
allowed to compare the performance of the implemented algorithms and validate
the proposed improvements.A Engenharia Metabólica visa a alteração do metabolismo celular dos microorganismos
com vista ao desenho de novas estirpes com fins industriais. As
aplicações destas modificações genéticas na Biotecnologia derivam da necessidade
de produzir de forma otimizada compostos de alto valor. O desenvolvimento de
modelos computacionais propõe uma abordagem quantitativa para a manipulação
destes organismos. Neste contexto, a modelação baseada em restrições constitui
uma das abordagens mais usadas para a previsão do comportamento celular. Esta
permite reduzir o espaço de soluções viáveis descrevendo o fenótipo celular a partir
dos fluxos metabólicos. Nestas condições, o metabolismo celular pode ser representado
como um sistema algébrico restringido pelas leis da conservação de massa
e termodinâmica.
Estes sistemas podem ser representados como redes, tirando partido de diferentes
paradigmas baseados em grafos, incluindo os grafos bipartidos, os hipergrafos e
os grafos de processos. Esta tese explora estas representações e os respetivos
algoritmos para a análise topológica de redes metabólicas. O objetivo principal
será o de identificar potenciais vias metabólicas para a optimização da produção de
compostos selecionados. Relacionado com esta tarefa, serão desenhados algoritmos
com o objetivo de complementar redes de organismos específicos, tomando como
entradas bases de dados metabólicas de maior dimensão, inserindo novas reações
de forma a torná-los capazes da produção de novos compostos de interesse.
Para abordar estes problemas, bem como tarefas relacionadas ao nível do pré-processamento
e avaliação das soluções, foi desenvolvida uma plataforma computacional
completa. Esta integra um conjunto de algoritmos previamente propostos
por diversos autores, em conjunto com melhorias significativas que foram
necessárias para que estes pudessem lidar com redes metabólicas de grande escala.
Estas melhorias resultam da identificação de problemas nos algoritmos no que diz
respeito à sua escalabilidade.
Um caso de estudo na Engenharia Metabólica sintética foi selecionado da literatura
para validar os algoritmos e testar as capacidades da plataforma implementada.
Este permitiu comparar o desempenho dos algoritmos implementados e validar as
melhorias propostas
Evaluation and development of algorithms and computational tools for metabolic pathway optimization
Programa de Doutoramento em Informática (MAP-i)Metabolic engineering exploits microorganisms to build cell factories, allowing to produce
valuable compounds from their enzymatic machinery. It involves the selection of an organism,
along with a set of genetic modifications to optimize the process. Information
regarding biological mechanisms are scattered among the literature. Metabolic databases
provide a centralized platform compiling existing biological data to build a catalog of all
known enzymatic transformations across all domains of life.
The development of genome-scale metabolic models allows to expose all possible biochemical
transformations that an organism can offer. Computer algorithms use these models
to exploit the capabilities and limitations of the organisms. Constraint-based modeling
approaches allow to predict phenotype given modifications in the network. In recent years,
there has been a significant increase in the number of available models, and for certain
organisms several models were built. The accuracy of these methods is in many cases
dependent on the quality of these models, that is limited to the available information in
the literature (or databases).
This thesis improves the existing methods by developing better data management
strategies for the metabolic modeling community. Metabolic databases are usually the
input data for many modeling tools, and the quality of solutions depends on the quality
of the databases. Currently, several metabolic databases exist, most of them sharing a
common set of information, and there is a need for a centralized system to take the most
advantage of their content. However, each database adopts its own naming system to
catalog its instances, being in many cases, diffcult to compare with others. An integration pipeline is here designed to fuse metabolic databases into a common
namespace allowing better analysis of the entire metabolic catalog across several databases,
and exploring different methods to reconcile the metabolites and reactions included in these
databases.
In a second part of this work, the Systems Biology Markup Language which is the most
common medium to store and represent genome-scale metabolic models is analyzed. Like
databases, models also adopt unique nomenclatures for reactions and compounds. Here,
methods to annotate metabolites and reactions in models are developed allowing to connect
models with database instances, thus allowing to adopt a single naming system for their
entities. The purpose of the methods is to standardize the entire model, therefore, other
entities such as, genes, compartments, simulation media, are also considered to unify these
models. The standardization methods were implemented in the KBase platform, which
allows to improve the compatibility of this system with models built from external tools.
In the last part of this thesis, the pathway enumeration problem is revisited. Synthetic
biology explores cellular modi cations to produce valuable products by inserting enzymatic
capabilities of other organisms. The selection of suitable set of genes is highly combinatorial,
since in many cases there are several alternatives to reach the target product. A
common limitation of most of the existing methods is the inability to fully explore this
combinatorial space. In this work, the (hyper)graph methods are analyzed and improved
to fully enumerate biological pathways. As result, two existing algorithms were improved
regarding to scalability, allowing to fully enumerate larger solution sets.Um dos objetivos da Engenharia Metabólica é a síntese de compostos de valor acrescentado
através de microrganismos. Uma das etapas deste processo envolve a seleção de organismos
em combinação com alterações genéticas que permitem otimizar este processo. As bases de
dados metabólicas centralizam os dados biológicos disponibilizando um catalogo de todo o
conhecimento existente relacionado ao contexto enzimático.
A reconstrução de modelos metabólicos à escala genómica permite estudar os processos
metabólicos dos diversos organismos. Com o recurso a métodos computacionais, estes
modelos permitem expor as capacidades e limitações dos diversos organismos. Abordagens
como a modelação baseada em restrições permitem prever fenótipos dadas alterações nas
vias metabólicas. Nas últimas décadas, houve um aumento significativo do número de
modelos publicados, e para alguns organismos existem várias versões disponíveis. A capacidade
de previsão destes modelos está dependente da informação disponível nas bases
de dados e na literatura.
Esta tese visa melhorar os métodos anteriores abordando questões relacionadas com a
integração de dados. As bases de dados metabólicas são geralmente a principal fonte de
informação para os métodos existentes, implicando diretamente na capacidade de resolução
destes problemas. Atualmente, existem várias bases de dados biológicas, havendo uma
necessidade de desenvolver sistemas centralizados. No entanto, é comum estes adotaram
identificares próprios, não sendo possível executar uma comparação direta. Neste trabalho,
foram desenvolvidas estratégias para reconciliar bases de dados no contexto metabólico,
permitindo integrar compostos e reações.
Na segunda parte deste trabalho, este processo de integração foi expandido para incluir
modelos metabólicos à escala genómica. De forma semelhante às bases de dados, os
modelos adotam também identificadores próprios para representar compostos e reações.
Para unificar modelos, foram desenvolvidos métodos de anotação que permitem relacionar
as instâncias dos modelos com as bases de dados. Foram, também, implementadas estratégias para identificar genes, compartimentos e as restrições da simulação. Neste trabalho, os métodos forma implementados na plataforma KBase, permitindo melhorar a
compatibilidade do sistema com os modelos externos.
Por fim, vários métodos de enumeração de vias metabólicas foram abordados. A biologia
sintética visa manipular o metabolismo celular para produção de compostos através da
inserção de genes. A seleção destes genes é um problema combinatório, que, dado um
composto alvo, identifica vários conjuntos de genes capazes de concretizar a via sintética.
Neste trabalho, pretende-se melhorar a capacidade de enumerar todas as vias possíveis,
dado um conjunto limitado de reações e o tamanho das vias. Como resultado, foram
melhorados dois métodos existentes baseados em hipergrafos, melhorando a escalabilidade
destes métodos permitindo enumerar problemas ou vias de maior dimensão.Fundação para a Ciência e Tecnologia (FCT) - PhD grant SFRH/BD/111490/201
On the Impact of Frequency Variation on Nonlinearity Mitigation using Frequency Combs
We investigated the impact of linewidth and dithering-induced frequency variation on the performance of nonlinearity mitigation using frequency combs. Compared to independent laser arrays, >2dB SNR gain can be achieved using comb sources
Automated reconstruction and comparison of metabolic models for diverse fungal genomes
info:eu-repo/semantics/publishedVersio
Discovery and implementation of a novel pathway for n-butanol production via 2-oxoglutarate
Background
One of the European Union directives indicates that 10% of all fuels must be bio-synthesized by 2020. In this regard, biobutanolnatively produced by clostridial strainsposes as a promising alternative biofuel. One possible approach to overcome the difficulties of the industrial exploration of the native producers is the expression of more suitable pathways in robust microorganisms such as Escherichia coli. The enumeration of novel pathways is a powerful tool, allowing to identify non-obvious combinations of enzymes to produce a target compound.
Results
This work describes the in silico driven design of E. coli strains able to produce butanol via 2-oxoglutarate by a novel pathway. This butanol pathway was generated by a hypergraph algorithm and selected from an initial set of 105,954 different routes by successively applying different filters, such as stoichiometric feasibility, size and novelty. The implementation of this pathway involved seven catalytic steps and required the insertion of nine heterologous genes from various sources in E. coli distributed in three plasmids. Expressing butanol genes in E. coli K12 and cultivation in High-Density Medium formulation seem to favor butanol accumulation via the 2-oxoglutarate pathway. The maximum butanol titer obtained was 85±1 mg L1 by cultivating the cells in bioreactors.
Conclusions
In this work, we were able to successfully translate the computational analysis into in vivo applications, designing novel strains of E. coli able to produce n-butanol via an innovative pathway. Our results demonstrate that enumeration algorithms can broad the spectrum of butanol producing pathways. This validation encourages further research to other target compounds.This study was supported by the Portuguese Foundation for Science and Technology (FCT) under the scope of a Ph.D. Grant (PD/BD/52366/2013) from MIT Portugal Program and the strategic funding of UID/BIO/04469 unit. Additional support was received by COMPETE 2020 (POCI-01-0145-FEDER-006684) and BioTecNorte operation (NORTE-01-0145-FEDER-000004) funded by the European Regional Development Fund under the scope of Norte2020-Programa Operacional Regional do Norte.
The authors also thank the Times New Roman project “Dynamics”, Ref. ERA-IB-2/0002/2014, funded by national funds through FCT/MCTES.
The genes thl, hbd, crt and adhE1 were kindly provided by Kristala L. Jones Prather from MIT.
The authors thank the project DDDeCaF - Bioinformatics Services for Data-Driven Design of Cell Factories and Communities, Ref. H2020-LEIT-BIO-2015-1 686070–1, funded by the European Commission and the Project LISBOA010145 FEDER007660 (Microbiologia Molecular, Estrutural e Celular) funded by FEDER funds through COMPETE2020 Programa Operacional Competitividade e Internacionalização (POCI) and by national funds through FCT Fundacao para a Ciencia e a Tecnologiainfo:eu-repo/semantics/publishedVersio
- …