140 research outputs found

    The metaRbolomics Toolbox in Bioconductor and beyond

    Get PDF
    Metabolomics aims to measure and characterise the complex composition of metabolites in a biological system. Metabolomics studies involve sophisticated analytical techniques such as mass spectrometry and nuclear magnetic resonance spectroscopy, and generate large amounts of high-dimensional and complex experimental data. Open source processing and analysis tools are of major interest in light of innovative, open and reproducible science. The scientific community has developed a wide range of open source software, providing freely available advanced processing and analysis approaches. The programming and statistics environment R has emerged as one of the most popular environments to process and analyse Metabolomics datasets. A major benefit of such an environment is the possibility of connecting different tools into more complex workflows. Combining reusable data processing R scripts with the experimental data thus allows for open, reproducible research. This review provides an extensive overview of existing packages in R for different steps in a typical computational metabolomics workflow, including data processing, biostatistics, metabolite annotation and identification, and biochemical network and pathway analysis. Multifunctional workflows, possible user interfaces and integration into workflow management systems are also reviewed. In total, this review summarises more than two hundred metabolomics specific packages primarily available on CRAN, Bioconductor and GitHub

    "MS-Ready" structures for non-targeted high-resolution mass spectrometry screening studies.

    Get PDF
    Chemical database searching has become a fixture in many non-targeted identification workflows based on high-resolution mass spectrometry (HRMS). However, the form of a chemical structure observed in HRMS does not always match the form stored in a database (e.g., the neutral form versus a salt; one component of a mixture rather than the mixture form used in a consumer product). Linking the form of a structure observed via HRMS to its related form(s) within a database will enable the return of all relevant variants of a structure, as well as the related metadata, in a single query. A Konstanz Information Miner (KNIME) workflow has been developed to produce structural representations observed using HRMS ("MS-Ready structures") and links them to those stored in a database. These MS-Ready structures, and associated mappings to the full chemical representations, are surfaced via the US EPA's Chemistry Dashboard ( https://comptox.epa.gov/dashboard/ ). This article describes the workflow for the generation and linking of ~ 700,000 MS-Ready structures (derived from ~ 760,000 original structures) as well as download, search and export capabilities to serve structure identification using HRMS. The importance of this form of structural representation for HRMS is demonstrated with several examples, including integration with the in silico fragmentation software application MetFrag. The structures, search, download and export functionality are all available through the CompTox Chemistry Dashboard, while the MetFrag implementation can be viewed at https://msbi.ipb-halle.de/MetFragBeta/

    The metaRbolomics Toolbox in Bioconductor and beyond

    Get PDF
    Metabolomics aims to measure and characterise the complex composition of metabolites in a biological system. Metabolomics studies involve sophisticated analytical techniques such as mass spectrometry and nuclear magnetic resonance spectroscopy, and generate large amounts of high-dimensional and complex experimental data. Open source processing and analysis tools are of major interest in light of innovative, open and reproducible science. The scientific community has developed a wide range of open source software, providing freely available advanced processing and analysis approaches. The programming and statistics environment R has emerged as one of the most popular environments to process and analyse Metabolomics datasets. A major benefit of such an environment is the possibility of connecting different tools into more complex workflows. Combining reusable data processing R scripts with the experimental data thus allows for open, reproducible research. This review provides an extensive overview of existing packages in R for different steps in a typical computational metabolomics workflow, including data processing, biostatistics, metabolite annotation and identification, and biochemical network and pathway analysis. Multifunctional workflows, possible user interfaces and integration into workflow management systems are also reviewed. In total, this review summarises more than two hundred metabolomics specific packages primarily available on CRAN, Bioconductor and GitHub

    The future of metabolomics in ELIXIR.

    Get PDF
    Metabolomics, the youngest of the major omics technologies, is supported by an active community of researchers and infrastructure developers across Europe. To coordinate and focus efforts around infrastructure building for metabolomics within Europe, a workshop on the "Future of metabolomics in ELIXIR" was organised at Frankfurt Airport in Germany. This one-day strategic workshop involved representatives of ELIXIR Nodes, members of the PhenoMeNal consortium developing an e-infrastructure that supports workflow-based metabolomics analysis pipelines, and experts from the international metabolomics community. The workshop established metabolite identification as the critical area, where a maximal impact of computational metabolomics and data management on other fields could be achieved. In particular, the existing four ELIXIR Use Cases, where the metabolomics community - both industry and academia - would benefit most, and which could be exhaustively mapped onto the current five ELIXIR Platforms were discussed. This opinion article is a call for support for a new ELIXIR metabolomics Use Case, which aligns with and complements the existing and planned ELIXIR Platforms and Use Cases

    Software platform virtualization in chemistry research and university teaching

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Modern chemistry laboratories operate with a wide range of software applications under different operating systems, such as Windows, LINUX or Mac OS X. Instead of installing software on different computers it is possible to install those applications on a single computer using Virtual Machine software. Software platform virtualization allows a single guest operating system to execute multiple other operating systems on the same computer. We apply and discuss the use of virtual machines in chemistry research and teaching laboratories.</p> <p>Results</p> <p>Virtual machines are commonly used for cheminformatics software development and testing. Benchmarking multiple chemistry software packages we have confirmed that the computational speed penalty for using virtual machines is low and around 5% to 10%. Software virtualization in a teaching environment allows faster deployment and easy use of commercial and open source software in hands-on computer teaching labs.</p> <p>Conclusion</p> <p>Software virtualization in chemistry, mass spectrometry and cheminformatics is needed for software testing and development of software for different operating systems. In order to obtain maximum performance the virtualization software should be multi-core enabled and allow the use of multiprocessor configurations in the virtual machine environment. Server consolidation, by running multiple tasks and operating systems on a single physical machine, can lead to lower maintenance and hardware costs especially in small research labs. The use of virtual machines can prevent software virus infections and security breaches when used as a sandbox system for internet access and software testing. Complex software setups can be created with virtual machines and are easily deployed later to multiple computers for hands-on teaching classes. We discuss the popularity of bioinformatics compared to cheminformatics as well as the missing cheminformatics education at universities worldwide.</p

    Advancing catchment hydrology to deal with predictions under change

    Get PDF
    Throughout its historical development, hydrology as an earth science, but especially as a problem-centred engineering discipline has largely relied (quite successfully) on the assumption of stationarity. This includes assuming time invariance of boundary conditions such as climate, system configurations such as land use, topography and morphology, and dynamics such as flow regimes and flood recurrence at different spatio-temporal aggregation scales. The justification for this assumption was often that when compared with the temporal, spatial, or topical extent of the questions posed to hydrology, such conditions could indeed be considered stationary, and therefore the neglect of certain long-term non-stationarities or feedback effects (even if they were known) would not introduce a large error. However, over time two closely related phenomena emerged that have increasingly reduced the general applicability of the stationarity concept: the first is the rapid and extensive global changes in many parts of the hydrological cycle, changing formerly stationary systems to transient ones. The second is that the questions posed to hydrology have become increasingly more complex, requiring the joint consideration of increasingly more (sub-) systems and their interactions across more and longer timescales, which limits the applicability of stationarity assumptions. Therefore, the applicability of hydrological concepts based on stationarity has diminished at the same rate as the complexity of the hydrological problems we are confronted with and the transient nature of the hydrological systems we are dealing with has increased. The aim of this paper is to present and discuss potentially helpful paradigms and theories that should be considered as we seek to better understand complex hydrological systems under change. For the sake of brevity we focus on catchment hydrology. We begin with a discussion of the general nature of explanation in hydrology and briefly review the history of catchment hydrology. We then propose and discuss several perspectives on catchments: as complex dynamical systems, self-organizing systems, co-evolving systems and open dissipative thermodynamic systems. We discuss the benefits of comparative hydrology and of taking an information-theoretic view of catchments, including the flow of information from data to models to predictions. In summary, we suggest that these perspectives deserve closer attention and that their synergistic combination can advance catchment hydrology to address questions of change

    The NORMAN Association and the European Partnership for Chemicals Risk Assessment (PARC): let’s cooperate! [Commentary]

    Get PDF
    The Partnership for Chemicals Risk Assessment (PARC) is currently under development as a joint research and innovation programme to strengthen the scientific basis for chemical risk assessment in the EU. The plan is to bring chemical risk assessors and managers together with scientists to accelerate method development and the production of necessary data and knowledge, and to facilitate the transition to next-generation evidence-based risk assessment, a non-toxic environment and the European Green Deal. The NORMAN Network is an independent, well-established and competent network of more than 80 organisations in the field of emerging substances and has enormous potential to contribute to the implementation of the PARC partnership. NORMAN stands ready to provide expert advice to PARC, drawing on its long experience in the development, harmonisation and testing of advanced tools in relation to chemicals of emerging concern and in support of a European Early Warning System to unravel the risks of contaminants of emerging concern (CECs) and close the gap between research and innovation and regulatory processes. In this commentary we highlight the tools developed by NORMAN that we consider most relevant to supporting the PARC initiative: (i) joint data space and cutting-edge research tools for risk assessment of contaminants of emerging concern; (ii) collaborative European framework to improve data quality and comparability; (iii) advanced data analysis tools for a European early warning system and (iv) support to national and European chemical risk assessment thanks to harnessing, combining and sharing evidence and expertise on CECs. By combining the extensive knowledge and experience of the NORMAN network with the financial and policy-related strengths of the PARC initiative, a large step towards the goal of a non-toxic environment can be taken

    How Large Is the Metabolome? A Critical Analysis of Data Exchange Practices in Chemistry

    Get PDF
    Calculating the metabolome size of species by genome-guided reconstruction of metabolic pathways misses all products from orphan genes and from enzymes lacking annotated genes. Hence, metabolomes need to be determined experimentally. Annotations by mass spectrometry would greatly benefit if peer-reviewed public databases could be queried to compile target lists of structures that already have been reported for a given species. We detail current obstacles to compile such a knowledge base of metabolites.As an example, results are presented for rice. Two rice (oryza sativa) subspecies have been fully sequenced, oryza japonica and oryza indica. Several major small molecule databases were compared for listing known rice metabolites comprising PubChem, Chemical Abstracts, Beilstein, Patent databases, Dictionary of Natural Products, SetupX/BinBase, KNApSAcK DB, and finally those databases which were obtained by computational approaches, i.e. RiceCyc, KEGG, and Reactome. More than 5,000 small molecules were retrieved when searching these databases. Unfortunately, most often, genuine rice metabolites were retrieved together with non-metabolite database entries such as pesticides. Overlaps from database compound lists were very difficult to compare because structures were either not encoded in machine-readable format or because compound identifiers were not cross-referenced between databases.We conclude that present databases are not capable of comprehensively retrieving all known metabolites. Metabolome lists are yet mostly restricted to genome-reconstructed pathways. We suggest that providers of (bio)chemical databases enrich their database identifiers to PubChem IDs and InChIKeys to enable cross-database queries. In addition, peer-reviewed journal repositories need to mandate submission of structures and spectra in machine readable format to allow automated semantic annotation of articles containing chemical structures. Such changes in publication standards and database architectures will enable researchers to compile current knowledge about the metabolome of species, which may extend to derived information such as spectral libraries, organ-specific metabolites, and cross-study comparisons
    corecore