4,374 research outputs found

    Reconstruction of an in silico metabolic model of _Arabidopsis thaliana_ through database integration

    Get PDF
    The number of genome-scale metabolic models has been rising quickly in recent years, and the scope of their utilization encompasses a broad range of applications from metabolic engineering to biological discovery. However the reconstruction of such models remains an arduous process requiring a high level of human intervention. Their utilization is further hampered by the absence of standardized data and annotation formats and the lack of recognized quality and validation standards.

Plants provide a particularly rich range of perspectives for applications of metabolic modeling. We here report the first effort to the reconstruction of a genome-scale model of the metabolic network of the plant _Arabidopsis thaliana_, including over 2300 reactions and compounds. Our reconstruction was performed using a semi-automatic methodology based on the integration of two public genome-wide databases, significantly accelerating the process. Database entries were compared and integrated with each other, allowing us to resolve discrepancies and enhance the quality of the reconstruction. This process lead to the construction of three models based on different quality and validation standards, providing users with the possibility to choose the standard that is most appropriate for a given application. First, a _core metabolic model_ containing only consistent data provides a high quality model that was shown to be stoichiometrically consistent. Second, an _intermediate metabolic model_ attempts to fill gaps and provides better continuity. Third, a _complete metabolic model_ contains the full set of known metabolic reactions and compounds in _Arabidopsis thaliana_.

We provide an annotated SBML file of our core model to enable the maximum level of compatibility with existing tools and databases. We eventually discuss a series of principles to raise awareness of the need to develop coordinated efforts and common standards for the reconstruction of genome-scale metabolic models, with the aim of enabling their widespread diffusion, frequent update, maximum compatibility and convenience of use by the wider research community and industry

    Improvement of sample classification and metabolite profiling in 1H-NMR by a machine learning-based modelling of signal parameters

    Get PDF
    RMN és una plataforma analítica utilitzada per quantificar els metabòlits presents en les mostres de metabolòmica. Els espectres de 1H-RMN mostren múltiples senyals de metabòlits amb tres paràmetres específics (desplaçament químic, ample mitjà de banda, intensitat) que poden mostrar reactivitat a les condicions de la mostra. Aquesta reactivitat perjudica l'optimització del fitat dels espectres necessari per a realitzar el perfilat automàtic de metabòlits de les mostres. L'objectiu d'aquesta tesi va ser l'exploració de l'ús de tècniques de tendència basades en Machine Learning (ML) amb l'ús de fluxos de treball robustos per modelar i explotar la informació present en els diferents paràmetres de senyal durant el perfilat de metabòlits dels conjunts de dades 1H-NMR. En particular, les aplicacions considerades van ser la millora de la classificació de les mostres en els estudis de metabolòmica i la millora de la qualitat del perfilat automàtic. A més d'assolir aquests objectius, també es van obtenir èxits addicionals (per exemple, la generació d'una nova eina de codi obert capaç de resoldre els reptes en l'elaboració de perfils de matrius complexes).RMN es una plataforma analítica utilizada para cuantificar los metabolitos presentes en las muestras de metabolómica. Los espectros de 1H-RMN muestran múltiples señales de metabolitos con tres parámetros específicos (desplazamiento químico, ancho medio de banda, intensidad) que pueden mostrar reactividad a las condiciones de la muestra. Esta reactividad perjudica a la optimización del fitado de los espectros necesario para realizar el perfilado automático de metabolitos de las muestras. El objetivo de esta tesis fue la exploración del uso de técnicas de tendencia basadas en Machine Learning (ML) con el uso de flujos de trabajo robustos para modelar y explotar la información presente en los diferentes parámetros de señal durante el perfilado de metabolitos de los conjuntos de datos 1H-NMR. En particular, las aplicaciones consideradas fueron la mejora de la clasificación de las muestras en los estudios de metabolómica y la mejora de la calidad del perfilado automático. Además de lograr estos objetivos, también se obtuvieron logros adicionales (por ejemplo, la generación de una nueva herramienta de código abierto capaz de resolver los retos en la elaboración de perfiles de matrices complejas).NMR is an analytical platform used to quantify the metabolites present in metabolomics samples. 1H-NMR spectra show multiple metabolite signals, each one with three parameters (chemical shift, half bandwidth, intensity) which can show reactivity to the sample conditions. This reactivity is a challenge for the optimization of the lineshape fitting of spectra necessary to perform the automatic metabolite profiling of samples. The aim of this PhD thesis was the exploration of the use of trending machine learning (ML)-based techniques and of robust ML-based workflows to model and then exploit the information present in the different parameters collected for each signal during the metabolite profiling of 1H-NMR datasets. In particular, the applications considered were the enhanced classification of samples in metabolomics studies and the enhancement of the quality of automatic profiling in 1H-NMR datasets. in addition to the achievement of these goals, additional achievements (e.g., the generation of a new open-source tool able to solve challenges in the profiling of complex matrices) was also fulfilled

    Genome-scale metabolic network reconstruction of model animals as a platform for translational research

    Get PDF
    Genome-scale metabolic models (GEMs) are used extensively for analysis of mechanisms underlying human diseases and metabolic malfunctions. However, the lack of comprehensive and high-quality GEMs for model organisms restricts translational utilization of omics data accumulating from the use of various disease models. Here we present a unified platform of GEMs that covers five major model animals, including Mouse1 (Mus musculus), Rat1 (Rattus norvegicus), Zebrafish1 (Danio rerio), Fruitfly1 (Drosophila melanogaster), and Worm1 (Caenorhabditis elegans). These GEMs represent the most comprehensive coverage of the metabolic network by considering both orthology-based pathways and species-specific reactions. All GEMs can be interactively queried via the accompanying web portal Metabolic Atlas. Specifically, through integrative analysis of Mouse1 with RNA-sequencing data from brain tissues of transgenic mice we identified a coordinated up-regulation of lysosomal GM2 ganglioside and peptide degradation pathways which appears to be a signature metabolic alteration in Alzheimer's disease (AD) mouse models with a phenotype of amyloid precursor protein overexpression. This metabolic shift was further validated with proteomics data from transgenic mice and cerebrospinal fluid samples from human patients. The elevated lysosomal enzymes thus hold potential to be used as a biomarker for early diagnosis of AD. Taken together, we foresee that this evolving open-source platform will serve as an important resource to facilitate the development of systems medicines and translational biomedical applications

    Genome-scale metabolic network reconstruction of model animals as a platform for translational research

    Get PDF
    Genome-scale metabolic models (GEMs) are used extensively for analysis of mechanisms underlying human diseases and metabolic malfunctions. However, the lack of comprehensive and high-quality GEMs for model organisms restricts translational utilization of omics data accumulating from the use of various disease models. Here we present a unified platform of GEMs that covers five major model animals, including Mouse1 (Mus musculus), Rat1 (Rattus norvegicus), Zebrafish1 (Danio rerio), Fruitfly1 (Drosophila melanogaster), and Worm1 (Caenorhabditis elegans). These GEMs represent the most comprehensive coverage of the metabolic network by considering both orthology-based pathways and species-specific reactions. All GEMs can be interactively queried via the accompanying web portal Metabolic Atlas. Specifically, through integrative analysis of Mouse1 with RNA-sequencing data from brain tissues of transgenic mice we identified a coordinated up-regulation of lysosomal GM2 ganglioside and peptide degradation pathways which appears to be a signature metabolic alteration in Alzheimer’s disease (AD) mouse models with a phenotype of amyloid precursor protein overexpression. This metabolic shift was further validated with proteomics data from transgenic mice and cerebrospinal fluid samples from human patients. The elevated lysosomal enzymes thus hold potential to be used as a biomarker for early diagnosis of AD. Taken together, we foresee that this evolving open-source platform will serve as an important resource to facilitate the development of systems medicines and translational biomedical applications

    A model validation pipeline for healthy tissue genome-scale metabolic models

    Get PDF
    Dissertação de mestrado em BioinformáticaNos últimos anos, os métodos de alto rendimento disponibilizaram dados ómicos referentes a várias camadas da organização biológica, permitindo a integração do conhecimento de componentes individuais em modelos complexos, como modelos metabólicos à escala genómica (GSMMs). Estes podem ser analisados por métodos de modelação baseada em restrições(CBM), que facilitam abordagens preditivas in silico. Os modelos metabólicos humanos têm sido usados para estudar tecidos saudáveis e as suas doenças metabólicas associadas, como obesidade, diabetes e cancro. Modelos humanos genéricos podem ser integrados com dados contextuais por meio de algoritmos de reconstrução, com vista a produzir modelos metabólicos contextualizados (CSMs), que são normalmente melhores a capturar a variação entre diferentes tecidos e tipos de células. Como o corpo humano contém uma grande variedade de tecidos e tipos de células, os CSMs são frequentemente adotados como um meio de obter modelos metabólicos mais precisos de tecido humano saudável. No entanto, ao contrário de modelos de microrganismos e cancro, que acomodam vários métodos de validação, como a comparação de fluxos in silico ou de previsões de genes essenciais com dados experimentais, os métodos de validação facilmente aplicáveis a CSMs de tecido humano saudável podem ser mais limitados. Consequentemente, apesar de esforços continuados para atualizar os modelos humanos genéricos e algoritmos de reconstrução para extrair CSMs de alta qualidade, a sua validação continua a ser uma preocupação. Este trabalho apresenta uma pipeline para a extração e validação básica de CSMs de tecidos humanos normais derivados da integração de dados transcriptómicos com um modelo humano genérico. Todos os CSMs foram extraídos do modelo genérico Human-GEM publicado recentemente por Robinson et al. (2020), usando o package Troppo em Python e nos algoritmos de reconstrução fastCORE e tINIT nele implementados. Os CSMs extraídos correspondem a 11 tecidos saudáveis disponíveis no conjunto de dados GTEx v8. Antes da extração, métodos de aprendizagem máquina foram aplicados à seleção de um limiar para conversão em gene scores. Os modelos de maior qualidade foram obtidos com um limite mínimo global aplicado diretamente aos dados ómicos. A estratégia de validação focou-se no número de tarefas metabólicas passadas como um indicador de desempenho. Por último, este trabalho é acompanhado por Jupyter Notebooks, que incluem um guia de extração de modelos para novos utilizadores.n the past few years, high-throughput experimental methods have made omics data available for several layers of biological organization, enabling the integration of knowledge from individual components into complex modelssuch as genome-scale metabolic models (GSMMs). These can be analysed by constraint based modelling (CBM) methods, which facilitate in silico predictive approaches. Human metabolic models have been used to study healthy human tissues and their associated metabolic diseases, such as obesity, diabetes, and cancer. Generic human models can be integrated with contextual data through reconstruction algorithms to produce context-specific models (CSMs), which are typically better at capturing the variation between different tissues and cell types. As the human body contains a multitude of tissues and cell types, CSMs are frequently adopted as a means to obtain accurate metabolic models of healthy human tissues. However, unlike microorganisms’ or cancer models, which allow several methods of validation such as the comparison of in silico fluxes or gene essentiality predictions to experimental data, the validation methods easily applicable to CSMs of healthy human tissue are more limited. Consequently, despite continued efforts to update generic human models and reconstruction algorithms to extract high quality CSMs, their validation remains a concern. This work presents a pipeline for the extraction and basic validation of CSMs of normal human tissues derived from the integration of transcriptomics data with a generic human model. All CSMs were extracted from the Human-GEM generic model recently published by Robinson et al. (2020), relied on the open-source Troppo Python package and in the fastCORE and tINIT reconstruction algorithms implemented therein. CSMs were extracted for 11 healthy tissues available in the GTEx v8 dataset. Prior to extraction, machine learning methods were applied to threshold selection for gene scores conversion. The highest quality models were obtained with a global threshold applied to the omics data directly. The CSM validation strategy focused on the total number of metabolic tasks passed as a performance indicator. Lastly, this work is accompanied by Jupyter Notebooks, which include a beginner friendly model extraction guide

    Dynamic bayesian networks for integrating multi-omics time series microbiome data

    Get PDF
    A key challenge in the analysis of longitudinal microbiome data is theinference of temporal interactions between microbial taxa, their genes, the metabolites that they consume and produce, and host genes. To address these challenges,we developed a computational pipeline, a pipeline for the analysis of longitudinalmulti-omics data (PALM), that first aligns multi-omics data and then uses dynamicBayesian networks (DBNs) to reconstruct a unified model. Our approach overcomesdifferences in sampling and progression rates, utilizes a biologically inspired multiomic framework, reduces the large number of entities and parameters in the DBNs,and validates the learned network. Applying PALM to data collected from inflammatory bowel disease patients, we show that it accurately identifies known and novelinteractions. Targeted experimental validations further support a number of the predicted novel metabolite-taxon interactionsFil: Ruiz Perez, Daniel. Florida International University; Estados UnidosFil: Lugo Martinez, Jose. University of Carnegie Mellon; Estados UnidosFil: Bourguignon, Natalia. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Florida International University; Estados Unidos. Universidad Tecnológica Nacional; ArgentinaFil: Mathee, Kalai. Florida International University; Estados UnidosFil: Lerner, Betiana. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Tecnológica Nacional; ArgentinaFil: Bar Joseph, Ziv. University of Carnegie Mellon; Estados UnidosFil: Narasimhan, Giri. Florida International University; Estados Unido

    Dynamic bayesian networks for integrating multi-omics time series microbiome data

    Get PDF
    A key challenge in the analysis of longitudinal microbiome data is the inference of temporal interactions between microbial taxa, their genes, the metabolites that they consume and produce, and host genes. To address these challenges, we developed a computational pipeline, a pipeline for the analysis of longitudinal multi-omics data (PALM), that first aligns multi-omics data and then uses dynamic Bayesian networks (DBNs) to reconstruct a unified model. Our approach overcomes differences in sampling and progression rates, utilizes a biologically inspired multiomic framework, reduces the large number of entities and parameters in the DBNs, and validates the learned network. Applying PALM to data collected from inflammatory bowel disease patients, we show that it accurately identifies known and novel interactions. Targeted experimental validations further support a number of the predicted novel metabolite-taxon interactions

    Decoding Complexity in Metabolic Networks using Integrated Mechanistic and Machine Learning Approaches

    Get PDF
    How can we get living cells to do what we want? What do they actually ‘want’? What ‘rules’ do they observe? How can we better understand and manipulate them? Answers to fundamental research questions like these are critical to overcoming bottlenecks in metabolic engineering and optimizing heterologous pathways for synthetic biology applications. Unfortunately, biological systems are too complex to be completely described by physicochemical modeling alone. In this research, I developed and applied integrated mechanistic and data-driven frameworks to help uncover the mysteries of cellular regulation and control. These tools provide a computational framework for seeking answers to pertinent biological questions. Four major tasks were accomplished. First, I developed innovative tools for key areas in the genome-to-phenome mapping pipeline. An efficient gap filling algorithm (called BoostGAPFILL) that integrates mechanistic and machine learning techniques was developed for the refinement of genome-scale metabolic network reconstructions. Genome-scale metabolic network reconstructions are finding ever increasing applications in metabolic engineering for industrial, medical and environmental purposes. Second, I designed a thermodynamics-based framework (called REMEP) for mutant phenotype prediction (integrating metabolomics, fluxomics and thermodynamics data). These tools will go a long way in improving the fidelity of model predictions of microbial cell factories. Third, I designed a data-driven framework for characterizing and predicting the effectiveness of metabolic engineering strategies. This involved building a knowledgebase of historical microbial cell factory performance from published literature. Advanced machine learning concepts, such as ensemble learning and data augmentation, were employed in combination with standard mechanistic models to develop a predictive platform for important industrial biotechnology metrics such as yield, titer, and productivity. Fourth, my modeling tools and skills have been used for case studies on fungal lipid metabolism analyses, E. coli resource allocation balances, reconstruction of the genome-scale metabolic network for a non-model species, R. opacus, as well as the rapid prediction of bacterial heterotrophic fluxomics. In the long run, this integrated modeling approach will significantly shorten the “design-build-test-learn” cycle of metabolic engineering, as well as provide a platform for biological discovery

    A pipeline for the reconstruction and evaluation of context-specific human metabolic models at a large-scale

    Get PDF
    Constraint-based (CB) metabolic models provide a mathematical framework and scaffold for in silico cell metabolism analysis and manipulation. In the past decade, significant efforts have been done to model human metabolism, enabled by the increased availability of multi-omics datasets and curated genome-scale reconstructions, as well as the development of several algorithms for context-specific model (CSM) reconstruction. Although CSM reconstruction has revealed insights on the deregulated metabolism of several pathologies, the process of reconstructing representative models of human tissues still lacks benchmarks and appropriate integrated software frameworks, since many tools required for this process are still disperse across various software platforms, some of which are proprietary.In this work, we address this challenge by assembling a scalable CSM reconstruction pipeline capable of integrating transcriptomics data in CB models. We combined omics preprocessing methods inspired by previous efforts with in-house implementations of existing CSM algorithms and new model refinement and validation routines, all implemented in the Troppo Python-based open-source framework. The pipeline was validated with multi-omics datasets from the Cancer Cell Line Encyclopedia (CCLE), also including reference fluxomics measurements for the MCF7 cell line.We reconstructed over 6000 models based on the Human-GEM template model for 733 cell lines featured in the CCLE, using MCF7 models as reference to find the best parameter combinations. These reference models outperform earlier studies using the same template by comparing gene essentiality and fluxomics experiments. We also analysed the heterogeneity of breast cancer cell lines, identifying key changes in metabolism related to cancer aggressiveness. Despite the many challenges in CB modelling, we demonstrate using our pipeline that combining transcriptomics data in metabolic models can be used to investigate key metabolic shifts. Significant limitations were found on these models ability for reliable quantitative flux prediction, thus motivating further work in genome-wide phenotype prediction.Author summary Genome-scale models of human metabolism are promising tools capable of contextualising large omics datasets within a framework that enables analysis and manipulation of metabolic phenotypes. Despite various successes in applying these methods to provide mechanistic hypotheses for deregulated metabolism in disease, there is no standardized workflow to extract these models using existing methods and the tools required to do so are mostly implemented using proprietary software.We have assembled a generic pipeline to extract and validate context-specific metabolic models using multi-omics datasets and implemented it using the troppo framework. We first validate our pipeline using MCF7 cell line models and assess their ability to predict lethal gene knockouts as well as flux activity using multi-omics data. We also demonstrate how this approach can be generalized for large-scale transcriptomics datasets and used to generate insights on the metabolic heterogeneity of cancer and relevant features for other data mining approaches. The pipeline is available as part of an open-source framework that is generic for a variety of applications.Competing Interest StatementThe authors have declared no competing interest.The authors thank the PhD scholarships co-funded by national funds and the European Social Fund through the Portuguese Foundation for Science and Technology (FCT), with references: SFRH/BD/118657/2016 (V.V.), SFRH/BD/133248/2017 (J.F.). This study was also supported by the FCT under the scope of the strategic funding of UIDB/04469/2020 unit.info:eu-repo/semantics/publishedVersio
    corecore