21 research outputs found

    Bio- and chemoinformatics approaches for metabolomics data analysis.

    No full text
    Metabolomics data analysis includes several repetitive tasks, including data sorting, calculation of exact masses or other physicochemical properties, or searching for identifiers in different databases. Several of these tasks can be automated using command line tools or short scripts in different scripting languages like Perl, Python, or R. This chapter presents simple solutions and short scripts written in R that can be used for the interaction with specific web services or for the calculation of physicochemical properties or molecular formulae

    OMG: Open Molecule Generator

    Get PDF
    <p>Abstract</p> <p>Computer Assisted Structure Elucidation has been used for decades to discover the chemical structure of unknown compounds. In this work we introduce the first open source structure generator, Open Molecule Generator (OMG), which for a given elemental composition produces all non-isomorphic chemical structures that match that elemental composition. Furthermore, this structure generator can accept as additional input one or multiple non-overlapping prescribed substructures to drastically reduce the number of possible chemical structures. Being open source allows for customization and future extension of its functionality. OMG relies on a modified version of the Canonical Augmentation Path, which grows intermediate chemical structures by adding bonds and checks that at each step only unique molecules are produced. In order to benchmark the tool, we generated chemical structures for the elemental formulas and substructures of different metabolites and compared the results with a commercially available structure generator. The results obtained, i.e. the number of molecules generated, were identical for elemental compositions having only C, O and H. For elemental compositions containing C, O, H, N, P and S, OMG produces all the chemically valid molecules while the other generator produces more, yet chemically impossible, molecules. The chemical completeness of the OMG results comes at the expense of being slower than the commercial generator. In addition to being open source, OMG clearly showed the added value of constraining the solution space by using multiple prescribed substructures as input. We expect this structure generator to be useful in many fields, but to be especially of great importance for metabolomics, where identifying unknown metabolites is still a major bottleneck.</p

    Metabolite Identification Using Automated Comparison of High-Resolution Multistage Mass Spectral Trees

    No full text
    Multistage mass spectrometry (MS<sup><i>n</i></sup>) generating so-called spectral trees is a powerful tool in the annotation and structural elucidation of metabolites and is increasingly used in the area of accurate mass LC/MS-based metabolomics to identify unknown, but biologically relevant, compounds. As a consequence, there is a growing need for computational tools specifically designed for the processing and interpretation of MS<sup><i>n</i></sup> data. Here, we present a novel approach to represent and calculate the similarity between high-resolution mass spectral fragmentation trees. This approach can be used to query multiple-stage mass spectra in MS spectral libraries. Additionally the method can be used to calculate structure–spectrum correlations and potentially deduce substructures from spectra of unknown compounds. The approach was tested using two different spectral libraries composed of either human or plant metabolites which currently contain 872 MS<sup><i>n</i></sup> spectra acquired from 549 metabolites using Orbitrap FTMS<sup><i>n</i></sup>. For validation purposes, for 282 of these 549 metabolites, 765 additional replicate MS<sup><i>n</i></sup> spectra acquired with the same instrument were used. Both the dereplication and de novo identification functionalities of the comparison approach are discussed. This novel MS<sup><i>n</i></sup> spectral processing and comparison approach increases the probability to assign the correct identity to an experimentally obtained fragmentation tree. Ultimately, this tool may pave the way for constructing and populating large MS<sup><i>n</i></sup> spectral libraries that can be used for searching and matching experimental MS<sup><i>n</i></sup> spectra for annotation and structural elucidation of unknown metabolites detected in untargeted metabolomics studies

    Integrated quantification and identification of aldehydes and ketones in biological samples

    No full text
    The identification of unknown compounds remains to be a bottleneck of mass spectrometry (MS)-based metabolomics screening experiments. Here, we present a novel approach which facilitates the identification and quantification of analytes containing aldehyde and ketone groups in biological samples by adding chemical information to MS data. Our strategy is based on rapid autosampler-in-needle-derivatization with p-toluenesulfonylhydrazine (TSH). The resulting TSH-hydrazones are separated by ultrahigh-performance liquid chromatography (UHPLC) and detected by electrospray ionization-quadrupole-time-of-flight (ESI-QqTOF) mass spectrometry using a SWATH (Sequential Window Acquisition of all Theoretical Fragment-Ion Spectra) data-independent high-resolution mass spectrometry (HR-MS) approach. Derivatization makes small, poorly ionizable or retained analytes amenable to reversed phase chromatography and electrospray ionization in both polarities. Negatively charged TSH-hydrazone ions furthermore show a simple and predictable fragmentation pattern upon collision induced dissociation, which enables the chemo-selective screening for unknown aldehydes and ketones via a signature fragment ion (m/z 155.0172). By means of SWATH, targeted and nontargeted application scenarios of the suggested derivatization route are enabled in the frame of a single UHPLC-ESI-QqTOF-HR-MS workflow. The method's ability to simultaneously quantify and identify molecules containing aldehyde and ketone groups is demonstrated using 61 target analytes from various compound classes and a (13)C labeled yeast matrix. The identification of unknowns in biological samples is detailed using the example of indole-3-acetaldehyde
    corecore