28 research outputs found

    A computational solution to automatically map metabolite libraries in the context of genome scale metabolic networks

    Get PDF
    This article describes a generic programmatic method for mapping chemical compound libraries on organism-specific metabolic networks from various databases (KEGG, BioCyc) and flat file formats (SBML and Matlab files). We show how this pipeline was successfully applied to decipher the coverage of chemical libraries set up by two metabolomics facilities MetaboHub (French National infrastructure for metabolomics and fluxomics) and Glasgow Polyomics (GP) on the metabolic networks available in the MetExplore web server. The present generic protocol is designed to formalize and reduce the volume of information transfer between the library and the network database. Matching of metabolites between libraries and metabolic networks is based on InChIs or InChIKeys and therefore requires that these identifiers are specified in both libraries and networks. In addition to providing covering statistics, this pipeline also allows the visualization of mapping results in the context of metabolic networks. In order to achieve this goal, we tackled issues on programmatic interaction between two servers, improvement of metabolite annotation in metabolic networks and automatic loading of a mapping in genome scale metabolic network analysis tool MetExplore. It is important to note that this mapping can also be performed on a single or a selection of organisms of interest and is thus not limited to large facilities

    Challenges and perspectives for naming lipids in the context of lipidomics

    Get PDF
    Introduction: Lipids are key compounds in the study of metabolism and are increasingly studied in biology projects. It is a very broad family that encompasses many compounds, and the name of the same compound may vary depending on the community where they are studied. Objectives: In addition, their structures are varied and complex, which complicates their analysis. Indeed, the structural resolution does not always allow a complete level of annotation so the actual compound analysed will vary from study to study and should be clearly stated. For all these reasons the identification and naming of lipids is complicated and very variable from one study to another, it needs to be harmonized. Methods & Results: In this position paper we will present and discuss the different way to name lipids (with chemoinformatic and semantic identifiers) and their importance to share lipidomic results. Conclusion: Homogenising this identification and adopting the same rules is essential to be able to share data within the community and to map data on functional networks

    Optimization of metabolome extraction procedures and implementation of a PeakForest database for the identification of microorganisms compounds by LC-HRMS

    No full text
    International audienceThe investigation of microbial communities and their interactions with their environments is a field of research that engaged the scientific community for several decades. We study in the laboratory microorganisms isolated from freshwater and atmospheric ecosystems in different conditions. The aim of this internship was to optimize the metabolomic workflows of four microorganisms (Pseudomonas syringae, Rhodococcus rhodococcus, Microcystis aeruginosa, Chlorella spp.) from extraction to annotation of compounds. Objectives: i) Optimize metabolome extraction protocol for each microorganism, ii) Implementation of a PeakForest database with MetaCyc and NPATLAS databases, iii) Annotation of microorganism's metabolomes

    Metavir: a web server dedicated to virome analysis.

    No full text
    International audienceSUMMARY: Metavir is a web server dedicated to the analysis of viral metagenomes (viromes). In addition to classical approaches for analyzing metagenomes (general sequence characteristics, taxonomic composition), new tools developed specifically for viral sequence analysis make it possible to: (i) explore viral diversity through automatically constructed phylogenies for selected marker genes, (ii) estimate gene richness through rarefaction curves and (iii) perform cross-comparison against other viromes using sequence similarities. Metavir is thus unique as a platform that allows a comprehensive virome analysis. AVAILABILITY: Metavir is freely available online at: http://metavir-meb.univ-bpclermont.fr. CONTACT: [email protected]. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

    Spectral Database: from data model to web interface

    No full text
    MetaboHUB is a metabolomics and fluxomics infrastructure that provides tools to research teams and partners. The Bioinformatics and Biostatistics service is specialized in NMR, GC- and LC-MS data processing and analysis, from raw data to metabolite identification. To challenge the annotation of these data and centralize knowledge, a dedicated team is building a software to assist in identification, including a compound and spectra database. The core of the “MetaboHUB Spectral Database”, called "data-model", is a computational representation of each entity involved in Spectra analysis and Chemical Compounds identification. One of the strengths of the project is the common work between chemical experts and bioinformaticians in data model design permitting respect of logics and constraints uses in Metabolomics during data manipulation and storage. The software architecture allows us to use parts of the project as standalone software, available for the community. The data-model seems to be able to manage several types of chemical compounds (like standards or sub-structures) and different types of Spectra (MS, MS/MS and NMR, simple, JRES and multidimensional). We will be able to approve the data-model with data from the chemical libraries provided by MetaboHUB members. One of the final goals of the spectral database is to provide a computed aided spectra identification tool, using all these data.thought a web-portal. Two milestones are coming: a first to provide a mechanism to import spectral data in the data-model (which means in the database too), a second to define metadata around spectral analysis

    Spectral Database: from data model to web interface

    No full text
    National audienceMetaboHUB is a metabolomics and fluxomics infrastructure that provides tools to research teams and partners. The Bioinformatics and Biostatistics service is specialized in NMR, GC- and LC-MS data processing and analysis, from raw data to metabolite identification. To challenge the annotation of these data and centralize knowledge, a dedicated team is building a software to assist in identification, including a compound and spectra database. The core of the “MetaboHUB Spectral Database”, called "data-model", is a computational representation of each entity involved in Spectra analysis and Chemical Compounds identification. One of the strengths of the project is the common work between chemical experts and bioinformaticians in data model design permitting respect of logics and constraints uses in Metabolomics during data manipulation and storage. The software architecture allows us to use parts of the project as standalone software, available for the community. The data-model seems to be able to manage several types of chemical compounds (like standards or sub-structures) and different types of Spectra (MS, MS/MS and NMR, simple, JRES and multidimensional). We will be able to approve the data-model with data from the chemical libraries provided by MetaboHUB members. One of the final goals of the spectral database is to provide a computed aided spectra identification tool, using all these data.thought a web-portal. Two milestones are coming: a first to provide a mechanism to import spectral data in the data-model (which means in the database too), a second to define metadata around spectral analysis

    Untargeted metabolomic approach by GC-QTOF : From low to high resolution

    No full text
    Les approches mĂ©tabolomiques non ciblĂ©es par GC-MS ont pu Ă©voluer grĂące au dĂ©veloppementde nouveaux instruments de plus haute rĂ©solution, comme le GC-QToF. Cette approche est utilisĂ©e aulaboratoire dans le cadre de projets scientifiques pour la recherche de biomarqueurs permettant la carac-tĂ©risation de phĂ©notypes mĂ©taboliques.Une mĂ©thode d’analyse non ciblĂ©e pour la dĂ©termination de profils mĂ©taboliques de biofluides par GC-QToF a Ă©tĂ© adaptĂ©e d’une mĂ©thode basse rĂ©solution (Gao et al (1)) basĂ©e sur une double dĂ©rivationoximation /silylation.Cette technique, plus sensible et plus rĂ©solutive, nĂ©cessite des outils de traitement des donnĂ©es spĂ©ci-fiques dĂ©diĂ©s. Aussi, nous avons dĂ» adapter des outils dĂ©veloppĂ©s par notre laboratoire pour le traitementde donnĂ©es mĂ©tabolomiques Ă  ce type de donnĂ©es. Ces outils comprennent l’extraction des donnĂ©es parxcms sous la plateforme Galaxy (W4M, (2)), ainsi que tout le workflow conduisant Ă  l’annotation desions extraits aprĂšs filtration et correction des effets batch.ParallĂšlement, nous dĂ©ployons une stratĂ©gie de dĂ©convolution, Ă  partir d’outils constructeurs afin de com-plĂ©ter les rĂ©sultats obtenus sous Galaxy.A ce jour, les bibliothĂšques GC-MS (NIST, Golm, Massbank) restent trĂšs utilisĂ©es pour l’identificationdes mĂ©tabolites mais ne contiennent aucun spectre avec des masses prĂ©cises bien que certains provi-ennent de GC-EI-ToF. Par consĂ©quent, nous constituons une bibliothĂšque interne en haute rĂ©solutionavec des standards purs et en matrices biologiques qui alimentera la base de donnĂ©es PeakForest del’infrastructure française MetaboHUB.La mesure des masses prĂ©cises ainsi que le dĂ©veloppement de nouveaux outils d’automatisation du traite-ment de donnĂ©es devraient permettre de lever certains verrous rencontrĂ©s dans la recherche de biomar-queurs concernant l’identification des mĂ©tabolites

    Metabolite reporting in large-scale studies within different metabolomics communities: DO WE SPEAK THE SAME LANGUAGE?

    No full text
    Since the emergence of high throughput metabolomics, there has been a growing number of scientific communities performing metabolomic studies. Therefore, it has become crucial to standardize reporting and sharing of metabolites. Although minimum reporting standards for analytical practices and data processing are available, there are no established standards for metabolite reporting. In this context, our objective was to review the existing practices in terms of metabolite reporting in different scientific communities both in published results and across databases.In this context, we considered plasma metabolites reported in human large-scale studies from different communities, namely analytical chemistry, medicine and epidemiology. We focused only on metabolites reported as level 1 identification according to the Metabolomics Standard Initiative. We applied a data curation workflow on the list of annotated metabolites given by the authors. First, we performed a manual curation that included the addition of missing identifiers and the editing of some incoherent metadata. Second, we applied an automatic query algorithm in order to obtain additional information from available databases such as the compact hash code of the IUPAC International Chemical Identifier “InChIKey”. Identified metabolites were then compared between the selected studies using either the names given by the authors or the InChIKeys added after data curation. Regular inconsistencies were observed in metabolite reporting both in published results and across different databases. In the former, incoherence was observed in the metabolite information (identifiers not referring to the same isomer, metabolite name not corresponding to the molecular formula). Besides, isomers were listed with their corresponding retention times, yet without any indication of the isomers’ identity. On the other hand, cross-linking provided across databases presented some incoherent information regarding nomenclatures, optical isomerism, stereochemistry of asymmetric carbons, and molecular structure (acid/base; zwitterionic or canonical forms, molecules with a permanent charge) in addition to a mismatch between two structurally different compounds. The evaluation of metabolite reporting across different databases for instance HMDB, PubChem and ChEBI was performed with the help of the Metabolomics Semantic DataLake (MSD) team. Information was calculated from latest public versions of the aforementioned databases, under a Big Data infrastructure (Apache Spark) and Scala programming language. Based on the InChIKey, we were able to identify all incorrect metabolite matches in HMDB, PubChem and ChEBI and to categorize them into “structurally different compounds”, “optical isomerism” or “structural isomerism”.Although not yet required, the InChIKey was found to be the most suitable identifier for comparing reported metabolites between studies and across databases. It is therefore recommended either to use this identifier or to perform a deep data curation when reporting identified metabolites. This work will allow providing guidelines for a more effective and reproducible metabolomics data sharing

    Metabolite reporting in large-scale studies within different metabolomics communities: DO WE SPEAK THE SAME LANGUAGE?

    No full text
    International audienceSince the emergence of high throughput metabolomics, there has been a growing number of scientific communities performing metabolomic studies. Therefore, it has become crucial to standardize reporting and sharing of metabolites. Although minimum reporting standards for analytical practices and data processing are available, there are no established standards for metabolite reporting. In this context, our objective was to review the existing practices in terms of metabolite reporting in different scientific communities both in published results and across databases.In this context, we considered plasma metabolites reported in human large-scale studies from different communities, namely analytical chemistry, medicine and epidemiology. We focused only on metabolites reported as level 1 identification according to the Metabolomics Standard Initiative. We applied a data curation workflow on the list of annotated metabolites given by the authors. First, we performed a manual curation that included the addition of missing identifiers and the editing of some incoherent metadata. Second, we applied an automatic query algorithm in order to obtain additional information from available databases such as the compact hash code of the IUPAC International Chemical Identifier “InChIKey”. Identified metabolites were then compared between the selected studies using either the names given by the authors or the InChIKeys added after data curation. Regular inconsistencies were observed in metabolite reporting both in published results and across different databases. In the former, incoherence was observed in the metabolite information (identifiers not referring to the same isomer, metabolite name not corresponding to the molecular formula). Besides, isomers were listed with their corresponding retention times, yet without any indication of the isomers’ identity. On the other hand, cross-linking provided across databases presented some incoherent information regarding nomenclatures, optical isomerism, stereochemistry of asymmetric carbons, and molecular structure (acid/base; zwitterionic or canonical forms, molecules with a permanent charge) in addition to a mismatch between two structurally different compounds. The evaluation of metabolite reporting across different databases for instance HMDB, PubChem and ChEBI was performed with the help of the Metabolomics Semantic DataLake (MSD) team. Information was calculated from latest public versions of the aforementioned databases, under a Big Data infrastructure (Apache Spark) and Scala programming language. Based on the InChIKey, we were able to identify all incorrect metabolite matches in HMDB, PubChem and ChEBI and to categorize them into “structurally different compounds”, “optical isomerism” or “structural isomerism”.Although not yet required, the InChIKey was found to be the most suitable identifier for comparing reported metabolites between studies and across databases. It is therefore recommended either to use this identifier or to perform a deep data curation when reporting identified metabolites. This work will allow providing guidelines for a more effective and reproducible metabolomics data sharing

    Enrichissement de la base de données spectrale peakforest en LC-MS

    No full text
    L’analyse mĂ©tabolomique non ciblĂ©e est une approche puissante permettant la caractĂ©risation du phĂ©notype mĂ©tabolique liĂ© aux dĂ©veloppements de maladies chroniques. L’identification des biomarqueurs qui y sont associĂ©s est devenue un enjeu majeur pour ce type d’approche. Il existe aujourd’hui un trĂšs large panel de banques de donnĂ©es en mĂ©tabolomique, pouvant aider lors de cette Ă©tape d’identification, telles que MassBank, HMDB ou Lipidmaps, mais ces bases n’intĂšgrent pas de donnĂ©es chromatographiques alors que le temps de rĂ©tention peut ĂȘtre un paramĂštre contribuant largement dans l’identification de molĂ©cules (Sumner et al, 2014). La base de donnĂ©es PeakForest est une banque de donnĂ©es de spectres de rĂ©fĂ©rence dĂ©diĂ©e Ă  l’annotation des donnĂ©es mĂ©tabolomiques. Plus de 1000 composĂ©s standards (mĂ©tabolites endogĂšnes dĂ©jĂ  dĂ©crits dans les biofluides) ont Ă©tĂ© analysĂ©s en LC-HRMS (Orbitrap, QTof) et selon des mĂ©thodes chromatographiques complĂ©mentaires au sein des quatre plateformes du consortium MetaboHUB. L’implĂ©mentation de PeakForest est rĂ©alisĂ© via des fichiers ‘template’ qui intĂšgrent les metadata et les peaklists annotĂ©es. L’originalitĂ© de cette base est qu’elle intĂšgre Ă©galement les conditions chromatographiques ainsi que les temps de rĂ©tention de chaque molĂ©cule, ce qui permet d’intĂ©grer ce paramĂštre dans les requĂȘtes. L’objectif sera de pouvoir utiliser cette ressource pour une annotation automatique des jeux de donnĂ©es. C’est pourquoi la base de donnĂ©es PeakForest est utilisable via des outils Galaxy, bientĂŽt intĂ©grĂ©s au sein de la plate-forme web Galaxy W4M (Workflow4Metabolomics ; Giacomoni et al, 2015). Une preuve de concept de l’utilisation de cet outil a Ă©tĂ© rĂ©alisĂ©e pour l’annotation d’un Ă©chantillon de plasma de rĂ©fĂ©rence du NIST. Au total plus de 70 mĂ©tabolites ont Ă©tĂ© confirmĂ©s avec un score Ă©gal Ă  5 (Sumner et al, 2014). Ces rĂ©sultats pourront Ă  terme ĂȘtre intĂ©grĂ©s au sein de PeakForest aprĂšs curation effectuĂ©e par des experts dans le but de valider les rĂ©sultats. Des donnĂ©es MS/MS viendront Ă©galement complĂ©menter la base. Cet outil dĂ©diĂ© Ă  l’annotation de mĂ©tabolites en haut dĂ©bit contribuera donc Ă  terme Ă  enrichir la caractĂ©risation des mĂ©tabolomes de diffĂ©rents systĂšmes biologique
    corecore