25 research outputs found
Metabolomic Semantic Datalake : A Scalable Approach to Managing Metabolomics Semantic Resources
International audienc
Metabolomic Semantic Datalake : A Scalable Approach to Managing Metabolomics Semantic Resources
International audienc
Spectral Database: from data model to web interface
MetaboHUB is a metabolomics and fluxomics infrastructure that provides tools to research teams and partners. The Bioinformatics and Biostatistics service is specialized in NMR, GC- and LC-MS data processing and analysis, from raw data to metabolite identification. To challenge the annotation of these data and centralize knowledge, a dedicated team is building a software to assist in identification, including a compound and spectra database. The core of the âMetaboHUB Spectral Databaseâ, called "data-model", is a computational representation of each entity involved in Spectra analysis and Chemical Compounds identification. One of the strengths of the project is the common work between chemical experts and bioinformaticians in data model design permitting respect of logics and constraints uses in Metabolomics during data manipulation and storage. The software architecture allows us to use parts of the project as standalone software, available for the community. The data-model seems to be able to manage several types of chemical compounds (like standards or sub-structures) and different types of Spectra (MS, MS/MS and NMR, simple, JRES and multidimensional). We will be able to approve the data-model with data from the chemical libraries provided by MetaboHUB members. One of the final goals of the spectral database is to provide a computed aided spectra identification tool, using all these data.thought a web-portal. Two milestones are coming: a first to provide a mechanism to import spectral data in the data-model (which means in the database too), a second to define metadata around spectral analysis
PeakForest: a spectral database and its toolbox, dedicated to the Metabolomicsâ community
International audienc
Trapping the tiger : efficacy of the novel bg-sentinel 2 with several attractants and carbon dioxide for collecting Aedes albopictus (Diptera : Culicidae) in Southern France
Targeted trapping of mosquito disease vectors plays an important role in the surveillance and control of mosquito-borne diseases. The Asian tiger mosquito, Aedes albopictus (Skuse), is an invasive species, which is spreading throughout the world, and is a potential vector of 24 arboviruses, particularly efficient in the transmission of chikungunya, dengue, and zika viruses. Using a 4 x 4 Latin square design, we assessed the efficacy of the new BG-Sentinel 2 mosquito trap using the attractants BG-lure and (R)-1-octen-3-ol cartridge, alone or in combination, and with and without carbon dioxide, for the field collection of Ae. albopictus mosquitoes. We found a synergistic effect of attractant and carbon dioxide that significantly increased twofold to fivefold the capture rate of Ae. albopictus. In combination with carbon dioxide, BG-lure cartridge is more effective than (R)-1-octen-3-ol in attracting females, while a combination of both attractants and carbon dioxide is the most effective for capturing males. In the absence of carbon dioxide, BG-lure cartridge alone did not increase the capture of males or females when compared with an unbaited trap. However, the synergistic effect of carbon dioxide and BG-lure makes this the most efficient combination in attracting Ae. albopictus
Metabolite reporting in large-scale studies within different metabolomics communities: DO WE SPEAK THE SAME LANGUAGE?
Since the emergence of high throughput metabolomics, there has been a growing number of scientific communities performing metabolomic studies. Therefore, it has become crucial to standardize reporting and sharing of metabolites. Although minimum reporting standards for analytical practices and data processing are available, there are no established standards for metabolite reporting. In this context, our objective was to review the existing practices in terms of metabolite reporting in different scientific communities both in published results and across databases.In this context, we considered plasma metabolites reported in human large-scale studies from different communities, namely analytical chemistry, medicine and epidemiology. We focused only on metabolites reported as level 1 identification according to the Metabolomics Standard Initiative. We applied a data curation workflow on the list of annotated metabolites given by the authors. First, we performed a manual curation that included the addition of missing identifiers and the editing of some incoherent metadata. Second, we applied an automatic query algorithm in order to obtain additional information from available databases such as the compact hash code of the IUPAC International Chemical Identifier âInChIKeyâ. Identified metabolites were then compared between the selected studies using either the names given by the authors or the InChIKeys added after data curation. Regular inconsistencies were observed in metabolite reporting both in published results and across different databases. In the former, incoherence was observed in the metabolite information (identifiers not referring to the same isomer, metabolite name not corresponding to the molecular formula). Besides, isomers were listed with their corresponding retention times, yet without any indication of the isomersâ identity. On the other hand, cross-linking provided across databases presented some incoherent information regarding nomenclatures, optical isomerism, stereochemistry of asymmetric carbons, and molecular structure (acid/base; zwitterionic or canonical forms, molecules with a permanent charge) in addition to a mismatch between two structurally different compounds. The evaluation of metabolite reporting across different databases for instance HMDB, PubChem and ChEBI was performed with the help of the Metabolomics Semantic DataLake (MSD) team. Information was calculated from latest public versions of the aforementioned databases, under a Big Data infrastructure (Apache Spark) and Scala programming language. Based on the InChIKey, we were able to identify all incorrect metabolite matches in HMDB, PubChem and ChEBI and to categorize them into âstructurally different compoundsâ, âoptical isomerismâ or âstructural isomerismâ.Although not yet required, the InChIKey was found to be the most suitable identifier for comparing reported metabolites between studies and across databases. It is therefore recommended either to use this identifier or to perform a deep data curation when reporting identified metabolites. This work will allow providing guidelines for a more effective and reproducible metabolomics data sharing
Spectral Database: from data model to web interface
National audienceMetaboHUB is a metabolomics and fluxomics infrastructure that provides tools to research teams and partners. The Bioinformatics and Biostatistics service is specialized in NMR, GC- and LC-MS data processing and analysis, from raw data to metabolite identification. To challenge the annotation of these data and centralize knowledge, a dedicated team is building a software to assist in identification, including a compound and spectra database. The core of the âMetaboHUB Spectral Databaseâ, called "data-model", is a computational representation of each entity involved in Spectra analysis and Chemical Compounds identification. One of the strengths of the project is the common work between chemical experts and bioinformaticians in data model design permitting respect of logics and constraints uses in Metabolomics during data manipulation and storage. The software architecture allows us to use parts of the project as standalone software, available for the community. The data-model seems to be able to manage several types of chemical compounds (like standards or sub-structures) and different types of Spectra (MS, MS/MS and NMR, simple, JRES and multidimensional). We will be able to approve the data-model with data from the chemical libraries provided by MetaboHUB members. One of the final goals of the spectral database is to provide a computed aided spectra identification tool, using all these data.thought a web-portal. Two milestones are coming: a first to provide a mechanism to import spectral data in the data-model (which means in the database too), a second to define metadata around spectral analysis
A small step into Galaxy, a faster pace for metabolomics. Galaxy and the metabolomics analysis Universe
National audienceFacing the emergence of new technologies in the field of metabolomics, treatment solutions adopted so far (XCMS, R scripts, etc.) clearly show their limits. Bottlenecks affect unified access to core applications as well as computing infrastructure and storage. In the context of collaboration between metabolomics and bioinformatics platforms, we have developed a full pipeline using Galaxy framework for data analysis. This modular and extensible workflow includes existing components (XCMS functions, etc.) but also a whole suite of complementary statistical tools. This implementation is accessible through a web interface, which guarantees the parameters completeness. The advanced features of Galaxy have made possible the integration of components from different sources and of different types. Finally, an extensible environment is offered to the metabolomics community, and enables preconfigured workflows sharing for new users, but also experts in the field.Face Ă lâarrivĂ©e de nouvelles technologies dans le domaine de la mĂ©tabolomique, les solutions de traitements adoptĂ©es jusquâĂ maintenant (XCMS, scripts R, etc.) montrent clairement des limites. Les verrous concernent aussi bien lâaccessibilitĂ© unifiĂ©e aux applications mĂ©tiers que les problĂšmes dâinfrastructure de calcul ou de stockage. Dans le cadre dâune collaboration entre les plateformes INRA/PFEM et CNRS/ABiMS-METABOMER, nous avons dĂ©veloppĂ© sous Galaxy un pipeline complet dâanalyse. Ce workflow modulaire et extensible, inclut des composants existant (fonctions XCMS, etc.) mais aussi toute une suite dâoutils statistiques complĂ©mentaires. Cette implĂ©mentation, accessible au travers dâune interface web, garantie lâexhaustivitĂ© des paramĂštres. Les fonctionnalitĂ©s avancĂ©es de Galaxy ont permis lâintĂ©gration de composants provenant de diffĂ©rentes sources et de nature diffĂ©rente. Au final, un premier environnement est proposĂ© Ă la communautĂ© mĂ©tabolomique, et permet le partage de workflows prĂ©configurĂ©s Ă destination dâutilisateurs novices, mais aussi dâexperts du domaine
A project-scale map of metadata to improve future data management
Today, the intra-lab application of best practices in the metabolomics field usually guarantees an adequate data exploitation within a single lab. However, the growing interest in multi-analyses designs (e.g. complementary analytical platforms, variety of matrices, multi-omics), as well as the need of data sharing and reuse, increase the difficulty of data management. Indeed, managing the multiplicity and the heterogeneity of information involved is required to achieve relevant knowledge extraction from metabolomics data. Within the MetaboHUB national infrastructure, one objective is to optimize data handling, especially metadata, to facilitate large-scale analyses, multi-platforms studies, and data FAIRisation (Findability, Accessibility, Interoperability, Reusability). In particular, this fits in the MetaboHUB scientific roadmap that promotes the open science development in the field of metabolomics.In the context of metabolomic and lipidomic studies, data production and analysis come along with a large diversity of metadata (data of the data). To identify clearly-defined bottlenecks and targets for future improvement in data management, the objective of this work was to build a metadata map at the scale of a scientific project. Aiming for completeness, this map was constructed in a collaborative and multidisciplinary way involving chemists, biologists, data stewards as well as computer scientists, combining their respective experience and knowledge. Based on the resulting metadata map, targets (areas and topics) to be further investigated were identified, enabling the construction of transversal working groups at the consortium scale. In particular, this work enables to focus efforts on clearly defined issues to improve standardisation of practices regarding data management and metadata documentation. In conclusion, this collaborative map construction has been shown to be an efficient tool to draw a clear « where do we stand / where do we go » picture inside a national infrastructure like MetaboHUB regarding project-scale metadata. This facilitates the definition of a precise data management. Such an approach could be translated within other infrastructures, consortia and/or communities.Développement d'une infrastructure française distribuée pour la métabolomique dédiée à l'innovatio