6 research outputs found

    Metabolomic database annotations via query of elemental compositions: Mass accuracy is insufficient even at less than 1 ppm

    Get PDF
    BACKGROUND: Metabolomic studies are targeted at identifying and quantifying all metabolites in a given biological context. Among the tools used for metabolomic research, mass spectrometry is one of the most powerful tools. However, metabolomics by mass spectrometry always reveals a high number of unknown compounds which complicate in depth mechanistic or biochemical understanding. In principle, mass spectrometry can be utilized within strategies of de novo structure elucidation of small molecules, starting with the computation of the elemental composition of an unknown metabolite using accurate masses with errors <5 ppm (parts per million). However even with very high mass accuracy (<1 ppm) many chemically possible formulae are obtained in higher mass regions. In automatic routines an additional orthogonal filter therefore needs to be applied in order to reduce the number of potential elemental compositions. This report demonstrates the necessity of isotope abundance information by mathematical confirmation of the concept. RESULTS: High mass accuracy (<1 ppm) alone is not enough to exclude enough candidates with complex elemental compositions (C, H, N, S, O, P, and potentially F, Cl, Br and Si). Use of isotopic abundance patterns as a single further constraint removes >95% of false candidates. This orthogonal filter can condense several thousand candidates down to only a small number of molecular formulas. Example calculations for 10, 5, 3, 1 and 0.1 ppm mass accuracy are given. Corresponding software scripts can be downloaded from . A comparison of eight chemical databases revealed that PubChem and the Dictionary of Natural Products can be recommended for automatic queries using molecular formulae. CONCLUSION: More than 1.6 million molecular formulae in the range 0–500 Da were generated in an exhaustive manner under strict observation of mathematical and chemical rules. Assuming that ion species are fully resolved (either by chromatography or by high resolution mass spectrometry), we conclude that a mass spectrometer capable of 3 ppm mass accuracy and 2% error for isotopic abundance patterns outperforms mass spectrometers with less than 1 ppm mass accuracy or even hypothetical mass spectrometers with 0.1 ppm mass accuracy that do not include isotope information in the calculation of molecular formulae

    Seven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry

    Get PDF
    BACKGROUND: Structure elucidation of unknown small molecules by mass spectrometry is a challenge despite advances in instrumentation. The first crucial step is to obtain correct elemental compositions. In order to automatically constrain the thousands of possible candidate structures, rules need to be developed to select the most likely and chemically correct molecular formulas. RESULTS: An algorithm for filtering molecular formulas is derived from seven heuristic rules: (1) restrictions for the number of elements, (2) LEWIS and SENIOR chemical rules, (3) isotopic patterns, (4) hydrogen/carbon ratios, (5) element ratio of nitrogen, oxygen, phosphor, and sulphur versus carbon, (6) element ratio probabilities and (7) presence of trimethylsilylated compounds. Formulas are ranked according to their isotopic patterns and subsequently constrained by presence in public chemical databases. The seven rules were developed on 68,237 existing molecular formulas and were validated in four experiments. First, 432,968 formulas covering five million PubChem database entries were checked for consistency. Only 0.6% of these compounds did not pass all rules. Next, the rules were shown to effectively reducing the complement all eight billion theoretically possible C, H, N, S, O, P-formulas up to 2000 Da to only 623 million most probable elemental compositions. Thirdly 6,000 pharmaceutical, toxic and natural compounds were selected from DrugBank, TSCA and DNP databases. The correct formulas were retrieved as top hit at 80–99% probability when assuming data acquisition with complete resolution of unique compounds and 5% absolute isotope ratio deviation and 3 ppm mass accuracy. Last, some exemplary compounds were analyzed by Fourier transform ion cyclotron resonance mass spectrometry and by gas chromatography-time of flight mass spectrometry. In each case, the correct formula was ranked as top hit when combining the seven rules with database queries. CONCLUSION: The seven rules enable an automatic exclusion of molecular formulas which are either wrong or which contain unlikely high or low number of elements. The correct molecular formula is assigned with a probability of 98% if the formula exists in a compound database. For truly novel compounds that are not present in databases, the correct formula is found in the first three hits with a probability of 65–81%. Corresponding software and supplemental data are available for downloads from the authors' website

    Probabilistic Modelling of Liquid Chromatography Time-of-Flight Mass Spectrometry

    No full text
    Liquid Chromatography Time-of-Flight Mass Spectrometry (LC-TOFMS) is an analytical platform that is widely used in the study of biological mixtures in the rapidly growing fields of proteomics and metabolomics. The development of statistical methods for the analysis of the very large data-sets that are typically produced in LC-TOFMS experiments is a very active area of research. However, the theoretical basis on which these methods are built is currently rather thin and as a result, inferences regarding the samples analysed are generally drawn in a somewhat qualitative fashion. This thesis concerns the development of a statistical formalism that can be used to describe and analyse the data produced in an LC-TOFMS experiment. This is done through the derivation of a number of probability distributions, each corresponding to a different level of approximation of the distribution of the empirically obtained data. Using such probabilistic models, statistically rigorous methods are developed and validated which are designed to address some of the central problems encountered in the practical analysis of LC-TOFMS data, most notably those related to the identification of unknown metabolites. Unlike most existing bioinformatics techniques, this work aims for rigour rather than generality. Consequently the methods developed are closely tailored to a particular type of TOF mass spectrometer, although they do carry over to other TOF instruments, albeit with important restrictions. And while the algorithms presented may constitute useful analytical tools for the mass spectrometers to which they can be applied, the broader implications of the general methodological approach that is taken are also of central importance. In particular, it is arguable that the main value of this work lies in its role as a proof-of-concept that detailed probabilistic modelling of TOFMS data is possible and can be used in practice to address important data analytical problems in a statistically rigorous manner
    corecore