17 research outputs found

    IdentiCS – Identification of coding sequence and in silico reconstruction of the metabolic network directly from unannotated low-coverage bacterial genome sequence

    Get PDF
    BACKGROUND: A necessary step for a genome level analysis of the cellular metabolism is the in silico reconstruction of the metabolic network from genome sequences. The available methods are mainly based on the annotation of genome sequences including two successive steps, the prediction of coding sequences (CDS) and their function assignment. The annotation process takes time. The available methods often encounter difficulties when dealing with unfinished error-containing genomic sequence. RESULTS: In this work a fast method is proposed to use unannotated genome sequence for predicting CDSs and for an in silico reconstruction of metabolic networks. Instead of using predicted genes or CDSs to query public databases, entries from public DNA or protein databases are used as queries to search a local database of the unannotated genome sequence to predict CDSs. Functions are assigned to the predicted CDSs simultaneously. The well-annotated genome of Salmonella typhimurium LT2 is used as an example to demonstrate the applicability of the method. 97.7% of the CDSs in the original annotation are correctly identified. The use of SWISS-PROT-TrEMBL databases resulted in an identification of 98.9% of CDSs that have EC-numbers in the published annotation. Furthermore, two versions of sequences of the bacterium Klebsiella pneumoniae with different genome coverage (3.9 and 7.9 fold, respectively) are examined. The results suggest that a 3.9-fold coverage of the bacterial genome could be sufficiently used for the in silico reconstruction of the metabolic network. Compared to other gene finding methods such as CRITICA our method is more suitable for exploiting sequences of low genome coverage. Based on the new method, a program called IdentiCS (Identification of Coding Sequences from Unfinished Genome Sequences) is delivered that combines the identification of CDSs with the reconstruction, comparison and visualization of metabolic networks (free to download at ). CONCLUSIONS: The reversed querying process and the program IdentiCS allow a fast and adequate prediction protein coding sequences and reconstruction of the potential metabolic network from low coverage genome sequences of bacteria. The new method can accelerate the use of genomic data for studying cellular metabolism

    Accelerating the reconstruction of genome-scale metabolic networks

    Get PDF
    BACKGROUND: The genomic information of a species allows for the genome-scale reconstruction of its metabolic capacity. Such a metabolic reconstruction gives support to metabolic engineering, but also to integrative bioinformatics and visualization. Sequence-based automatic reconstructions require extensive manual curation, which can be very time-consuming. Therefore, we present a method to accelerate the time-consuming process of network reconstruction for a query species. The method exploits the availability of well-curated metabolic networks and uses high-resolution predictions of gene equivalency between species, allowing the transfer of gene-reaction associations from curated networks. RESULTS: We have evaluated the method using Lactococcus lactis IL1403, for which a genome-scale metabolic network was published recently. We recovered most of the gene-reaction associations (i.e. 74 – 85%) which are incorporated in the published network. Moreover, we predicted over 200 additional genes to be associated to reactions, including genes with unknown function, genes for transporters and genes with specific metabolic reactions, which are good candidates for an extension to the previously published network. In a comparison of our developed method with the well-established approach Pathologic, we predicted 186 additional genes to be associated to reactions. We also predicted a relatively high number of complete conserved protein complexes, which are derived from curated metabolic networks, illustrating the potential predictive power of our method for protein complexes. CONCLUSION: We show that our methodology can be applied to accelerate the reconstruction of genome-scale metabolic networks by taking optimal advantage of existing, manually curated networks. As orthology detection is the first step in the method, only the translated open reading frames (ORFs) of a newly sequenced genome are necessary to reconstruct a metabolic network. When more manually curated metabolic networks will become available in the near future, the usefulness of our method in network prediction is likely to increase

    Flux Design: In silico design of cell factories based on correlation of pathway fluxes to desired properties

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The identification of genetic target genes is a key step for rational engineering of production strains towards bio-based chemicals, fuels or therapeutics. This is often a difficult task, because superior production performance typically requires a combination of multiple targets, whereby the complex metabolic networks complicate straightforward identification. Recent attempts towards target prediction mainly focus on the prediction of gene deletion targets and therefore can cover only a part of genetic modifications proven valuable in metabolic engineering. Efficient in silico methods for simultaneous genome-scale identification of targets to be amplified or deleted are still lacking.</p> <p>Results</p> <p>Here we propose the identification of targets via flux correlation to a chosen objective flux as approach towards improved biotechnological production strains with optimally designed fluxes. The approach, we name Flux Design, computes elementary modes and, by search through the modes, identifies targets to be amplified (positive correlation) or down-regulated (negative correlation). Supported by statistical evaluation, a target potential is attributed to the identified reactions in a quantitative manner. Based on systems-wide models of the industrial microorganisms <it>Corynebacterium glutamicum </it>and <it>Aspergillus niger</it>, up to more than 20,000 modes were obtained for each case, differing strongly in production performance and intracellular fluxes. For lysine production in <it>C. glutamicum </it>the identified targets nicely matched with reported successful metabolic engineering strategies. In addition, simulations revealed insights, e.g. into the flexibility of energy metabolism. For enzyme production in <it>A.niger </it>flux correlation analysis suggested a number of targets, including non-obvious ones. Hereby, the relevance of most targets depended on the metabolic state of the cell and also on the carbon source.</p> <p>Conclusions</p> <p>Objective flux correlation analysis provided a detailed insight into the metabolic networks of industrially relevant prokaryotic and eukaryotic microorganisms. It was shown that capacity, pathway usage, and relevant genetic targets for optimal production partly depend on the network structure and the metabolic state of the cell which should be considered in future metabolic engineering strategies. The presented strategy can be generally used to identify priority sorted amplification and deletion targets for metabolic engineering purposes under various conditions and thus displays a useful strategy to be incorporated into efficient strain and bioprocess optimization.</p

    Machine learning methods for metabolic pathway prediction

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A key challenge in systems biology is the reconstruction of an organism's metabolic network from its genome sequence. One strategy for addressing this problem is to predict which metabolic pathways, from a reference database of known pathways, are present in the organism, based on the annotated genome of the organism.</p> <p>Results</p> <p>To quantitatively validate methods for pathway prediction, we developed a large "gold standard" dataset of 5,610 pathway instances known to be present or absent in curated metabolic pathway databases for six organisms. We defined a collection of 123 pathway features, whose information content we evaluated with respect to the gold standard. Feature data were used as input to an extensive collection of machine learning (ML) methods, including naïve Bayes, decision trees, and logistic regression, together with feature selection and ensemble methods. We compared the ML methods to the previous PathoLogic algorithm for pathway prediction using the gold standard dataset. We found that ML-based prediction methods can match the performance of the PathoLogic algorithm. PathoLogic achieved an accuracy of 91% and an F-measure of 0.786. The ML-based prediction methods achieved accuracy as high as 91.2% and F-measure as high as 0.787. The ML-based methods output a probability for each predicted pathway, whereas PathoLogic does not, which provides more information to the user and facilitates filtering of predicted pathways.</p> <p>Conclusions</p> <p>ML methods for pathway prediction perform as well as existing methods, and have qualitative advantages in terms of extensibility, tunability, and explainability. More advanced prediction methods and/or more sophisticated input features may improve the performance of ML methods. However, pathway prediction performance appears to be limited largely by the ability to correctly match enzymes to the reactions they catalyze based on genome annotations.</p

    Toward the automated generation of genome-scale metabolic networks in the SEED

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Current methods for the automated generation of genome-scale metabolic networks focus on genome annotation and preliminary biochemical reaction network assembly, but do not adequately address the process of identifying and filling gaps in the reaction network, and verifying that the network is suitable for systems level analysis. Thus, current methods are only sufficient for generating draft-quality networks, and refinement of the reaction network is still largely a manual, labor-intensive process.</p> <p>Results</p> <p>We have developed a method for generating genome-scale metabolic networks that produces substantially complete reaction networks, suitable for systems level analysis. Our method partitions the reaction space of central and intermediary metabolism into discrete, interconnected components that can be assembled and verified in isolation from each other, and then integrated and verified at the level of their interconnectivity. We have developed a database of components that are common across organisms, and have created tools for automatically assembling appropriate components for a particular organism based on the metabolic pathways encoded in the organism's genome. This focuses manual efforts on that portion of an organism's metabolism that is not yet represented in the database. We have demonstrated the efficacy of our method by reverse-engineering and automatically regenerating the reaction network from a published genome-scale metabolic model for <it>Staphylococcus aureus</it>. Additionally, we have verified that our method capitalizes on the database of common reaction network components created for <it>S. aureus</it>, by using these components to generate substantially complete reconstructions of the reaction networks from three other published metabolic models (<it>Escherichia coli</it>, <it>Helicobacter pylori</it>, and <it>Lactococcus lactis</it>). We have implemented our tools and database within the SEED, an open-source software environment for comparative genome annotation and analysis.</p> <p>Conclusion</p> <p>Our method sets the stage for the automated generation of substantially complete metabolic networks for over 400 complete genome sequences currently in the SEED. With each genome that is processed using our tools, the database of common components grows to cover more of the diversity of metabolic pathways. This increases the likelihood that components of reaction networks for subsequently processed genomes can be retrieved from the database, rather than assembled and verified manually.</p

    Metabolic peculiarities of Aspergillus niger disclosed by comparative metabolic genomics

    Get PDF
    A genome-scale metabolic network and an in-depth genomic comparison of Aspergillus niger with seven other fungi is presented, revealing more than 1,100 enzyme-coding genes that are unique to A. niger

    The RAVEN Toolbox and Its Use for Generating a Genome-scale Metabolic Model for Penicillium chrysogenum

    Get PDF
    We present the RAVEN (Reconstruction, Analysis and Visualization of Metabolic Networks) Toolbox: a software suite that allows for semi-automated reconstruction of genome-scale models. It makes use of published models and/or the KEGG database, coupled with extensive gap-filling and quality control features. The software suite also contains methods for visualizing simulation results and omics data, as well as a range of methods for performing simulations and analyzing the results. The software is a useful tool for system-wide data analysis in a metabolic context and for streamlined reconstruction of metabolic networks based on protein homology. The RAVEN Toolbox workflow was applied in order to reconstruct a genome-scale metabolic model for the important microbial cell factory Penicillium chrysogenum Wisconsin54-1255. The model was validated in a bibliomic study of in total 440 references, and it comprises 1471 unique biochemical reactions and 1006 ORFs. It was then used to study the roles of ATP and NADPH in the biosynthesis of penicillin, and to identify potential metabolic engineering targets for maximization of penicillin production

    Desarrollo y análisis de algoritmos probabilísticos para la reconstrucción de modelos metabólicos a escala genómica

    Full text link
    This doctoral project is focused on the development and analysis of algorithms for the reconstruction of genome-scale metabolic models, such algorithms include decision-making based on probabilistic criteria. As a fundamental result of the doctoral research, the web application Computational Platform to Access Biological Information (COPABI), which can reconstruct genome-scale metabolic models of biological systems, has been developed. During its computational implementation, it was followed the methodology used for the reconstruction of the first genome-scale metabolic model of a photosynthetic microorganism, the Synechocystis sp. PCC6803. Different mathematical algorithms were applied to compare the models that were automatically generated by COPABI with those published in the literature for different species.El presente proyecto doctoral se ha centrado en el desarrollo y análisis de algoritmos para la reconstrucción de modelos metabólicos a escala genómica; tales algoritmos incluyen la toma de decisiones a partir de criterios probabilísticos. Como resultado fundamental de la investigación doctoral cabe destacar que se ha desarrollado la aplicación web Computational Platform to Access Biological Information (COPABI) que permite reconstruir modelos metabólicos a escala genómica de sistemas biológicos. Durante su implementación computacional, se ha seguido la metodología usada para la reconstrucción del primer modelo metabólico a escala genómica de un microorganismo fotosintético, la Synechocystis sp. PCC6803. Se aplicaron diferentes algoritmos matemáticos para comparar los modelos generados automáticamente por COPABI con los publicados en la literatura para diferentes especies.Reyes Chirino, R. (2013). Desarrollo y análisis de algoritmos probabilísticos para la reconstrucción de modelos metabólicos a escala genómica [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/34344TESI

    Desarrollo de métodos de simulación aplicados a la optimización de funciones objetivo biológicas

    Full text link
    [ES] La Biología de Sistemas es un campo de la investigación en el que confluyen varias disciplinas de conocimiento como la Física, Matemática, Química y Biología, donde las interacciones de los elementos internos de un microorganismo y el medio ambiente influyen en el desarrollo de procesos que se representan mediante un modelo matemático. Este enfoque permite comprender el funcionamiento de los sistemas biológicos y profundizar en el entendimiento de cómo sus interacciones conllevan a la aparición de nuevas propiedades y procesos. En el estudio de los procesos biológicos, se realiza la confirmación o refutación de una teoría que se confronta con resultados experimentales. La Biología de Sistemas utiliza una hipótesis basada en el estudio de los procesos mediante una modelización matemática de los mismos. Uno de los elementos principales de análisis en Biología de Sistemas es la reconstrucción de modelos metabólicos determinante a la hora de poder modificar el funcionamiento de un organismo determinado. Este trabajo se aborda la automatización de esta actividad, así como los fundamentos esenciales de la Herramienta COPABI, como paso fundamental para una buena reconstrucción antes de aplicar diferentes métodos de optimización a un modelo metabólico a escala genómica. Esta investigación se basa en métodos no tradicionales que permiten ofrecer mejoras en los resultados de las simulaciones, con un mejor acercamiento a la realidad en el contexto de la ingeniería metabólica. Presentando PyNetMet, una librería de Python, como herramienta para trabajar con redes y modelos metabólicos. Con el fin de ilustrar las características más importantes y algunos de sus usos, se muestran resultados de la herramienta como el cálculo de la agrupación media de las redes que representan a cada uno de los modelos metabólicos, el número de metabolitos desconectados en cada modelo y la distancia media entre dos metabolitos cualesquiera de la red. Analizar los modelos metabólicos partiendo de la optimización monobjetivo no siempre se acerca todo lo deseado a la realidad, puesto que uno o más objetivos pueden entrar en conflicto porque tienen como denominador común la necesidad de elegir entre diferentes alternativas que han de evaluarse en base a diversos criterios. Para ello, se presentó un algoritmo de optimización multiobjetivo basado en algoritmos evolutivos que consiste en una adaptación del algoritmo sp-MODE implementado en la herramienta bioinformática BioMOE, que considera de manera simultánea la optimización de dos o más objetivos, a menudo en conflicto, dando como soluciones diferentes distribuciones de flujo en la que una no es mejor que la otra. En el área de la comparación de modelos metabólicos se muestra una herramienta bioinformática llamada CompNet, basada en conceptos de teoría de grafos como las Redes de Petri, para poder establecer una comparación entre modelos metabólicos, determinando qué cambios serían necesarios para modificar determinadas funciones en uno de los modelos con respecto al otro, a través de la métrica Distancia de Edición. Mediante las métricas de Baláž y Bunke se muestra el grado de semejanza que existe entre dos modelos mediante un valor cuantitativo que indica las semejanzas y diferencias ellos.[EN] Systems Biology is a field of research in which several disciplines of knowledge converge such as Physics, Mathematics, Chemistry and Biology, where the interactions of the internal elements of a microorganism and the environment influence the development of processes that are represented by a mathematical model. This approach allows us to understand how biological systems work and to deepen our understanding of how their interactions lead to the emergence of new properties and processes. In the study of biological processes, the confirmation or refutation of a theory that is confronted with experimental results is performed. Systems Biology uses a hypothesis based on the study of processes by means of a mathematical modeling of them. One of the main elements of analysis in Systems Biology is the reconstruction of metabolic models, which is decisive when it comes to modifying the functioning of a given organism. This work addresses the automation of this activity, as well as the essential fundamentals of the COPABI Tool, as a fundamental step for a good reconstruction before applying different optimization methods to a metabolic model at genomic scale. This research is based on non-traditional methods that allow us to offer improvements in simulation results, with a better approach to reality in the context of metabolic engineering. Introducing PyNetMet, a Python library, as a tool for working with metabolic networks and models. In order to illustrate the most important characteristics and some of its uses, results of the tool are shown, such as the calculation of the mean grouping of the networks representing each of the metabolic models, the number of metabolites disconnected in each model and the mean distance between any two metabolites in the network. Analyzing metabolic models on the basis of monobjective optimization does not always bring the desired closer to reality, since one or more objectives may come into conflict because their common denominator is the need to choose between different alternatives to be evaluated on the basis of different criteria. To this end, a multi-target optimization algorithm based on evolutionary algorithms was presented, consisting of an adaptation of the sp-MODE algorithm implemented in the bioinformatics tool BioMOE, which simultaneously considers the optimization of two or more objectives, often in conflict, giving as solutions different flow distributions in which one is not better than the other. In the area of the comparison of metabolic models, a bioinformatics tool called Network-Compare is shown, based on concepts of graph theory such as Petri dishes, in order to establish a comparison between metabolic models, determining what changes would be necessary to modify certain functions in one of the models with respect to the other, through the Editing Distance metric. By means of the Baláž and Bunke metrics, the degree of similarity between two models is shown by means of a quantitative value that indicates the similarities and differences between them.[CA] La Biologia de Sistemes és un camp de la recerca en què conflueixen diverses disciplines de coneixement com la Física, Matemàtica, Química i Biologia, on les interaccions dels elements interns d'un microorganisme i el medi ambient influeixen en el desenvolupament de processos que es representen mitjançant un model matemàtic. Aquesta perspectiva permet entendre el funcionament dels sistemes biològics i aprofundir en la comprensió de com les seves interaccions generen noves propietats i processos. En l'estudi dels processos biològics, es realitza la confirmació o refutació d'una teoria que es confronta amb resultats experimentals. La Biologia de Sistemes utilitza una hipòtesi basada en l'estudi dels processos mitjançant una modelització matemàtica dels mateixos. Un dels elements principals d'anàlisi en Biologia de Sistemes és la reconstrucció de models metabòlics determinants a l'hora de poder modificar el funcionament d'un organisme determinat. En aquest treball s'aborda l'automatització d'aquesta activitat, així com els fonaments essencials de l'Eina COPABI, com a pas fonamental per a una bona reconstrucció abans d'aplicar diferents mètodes d'optimització a un model metabòlic a escala genòmica. Aquesta investigació es basa en mètodes no tradicionals que permeten oferir millores en els resultats de les simulacions, amb una millor aproximació a la realitat en el context de l'enginyeria metabòlica. Es presenta PyNetMet, una llibreria de Python, com a eina per treballar amb xarxes i models metabòlics. Per tal d'il¿lustrar les característiques més importants i alguns dels seus usos, es mostren resultats de l'eina com el càlcul de l'agrupació mitjana de les xarxes que representen a cada un dels models metabòlics, el nombre de metabòlits desconnectats en cada model i la distància mitjana entre dos metabòlits qualssevol de la xarxa. Analitzar els models metabòlics partint de l'optimització mono-objectiu no sempre s'acosta tot el desitjat a la realitat, ja que un o més objectius poden entrar en conflicte perquè tenen com a denominador comú la necessitat de triar entre diferents alternatives que han d'avaluar-se sobre la base de diversos criteris. Per a això, es va presentar un algoritme d'optimització multi-objectiu basat en algoritmes evolutius que consisteix en una adaptació de l'algoritme sp-MODE implementat en l'eina bioinformàtica BioMOE, que considera de manera simultània l'optimització de dos o més objectius, sovint en conflicte, donant com solucions diferents distribucions de flux en la qual una no és millor que l'altra. En l'àrea de la comparació de models metabòlics es mostra una eina bioinformàtica anomenada CompNet, basada en conceptes de teoria de grafs com les Xarxes de Petri, per poder establir una comparació entre models metabòlics, determinant quins canvis serien necessaris per a modificar determinades funcions en un dels models respecte a l'altre, a través de la mètrica Distància d'Edició. Mitjançant les mètriques de Balaz i Bunke es mostra el grau de semblança que hi ha entre dos models a través d'un valor quantitatiu que indica les semblances i diferències entre ells.Jaime Infante, RA. (2020). Desarrollo de métodos de simulación aplicados a la optimización de funciones objetivo biológicas [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/147112TESI
    corecore