1,596 research outputs found

    Software Tools and Approaches for Compound Identification of LC-MS/MS Data in Metabolomics.

    Get PDF
    The annotation of small molecules remains a major challenge in untargeted mass spectrometry-based metabolomics. We here critically discuss structured elucidation approaches and software that are designed to help during the annotation of unknown compounds. Only by elucidating unknown metabolites first is it possible to biologically interpret complex systems, to map compounds to pathways and to create reliable predictive metabolic models for translational and clinical research. These strategies include the construction and quality of tandem mass spectral databases such as the coalition of MassBank repositories and investigations of MS/MS matching confidence. We present in silico fragmentation tools such as MS-FINDER, CFM-ID, MetFrag, ChemDistiller and CSI:FingerID that can annotate compounds from existing structure databases and that have been used in the CASMI (critical assessment of small molecule identification) contests. Furthermore, the use of retention time models from liquid chromatography and the utility of collision cross-section modelling from ion mobility experiments are covered. Workflows and published examples of successfully annotated unknown compounds are included

    Rapid prediction of NMR spectral properties with quantified uncertainty

    Get PDF
    open access articleAccurate calculation of specific spectral properties for NMR is an important step for molecular structure elucidation. Here we report the development of a novel machine learning technique for accurately predicting chemical shifts of both 1H and 13C nuclei which exceeds DFT-accessible accuracy for 13C and 1H for a subset of nuclei, while being orders of magnitude more performant. Our method produces estimates of uncertainty, allowing for robust and confident predictions, and suggests future avenues for improved performance

    Mass Spectra Prediction with Structural Motif-based Graph Neural Networks

    Full text link
    Mass spectra, which are agglomerations of ionized fragments from targeted molecules, play a crucial role across various fields for the identification of molecular structures. A prevalent analysis method involves spectral library searches,where unknown spectra are cross-referenced with a database. The effectiveness of such search-based approaches, however, is restricted by the scope of the existing mass spectra database, underscoring the need to expand the database via mass spectra prediction. In this research, we propose the Motif-based Mass Spectrum Prediction Network (MoMS-Net), a system that predicts mass spectra using the information derived from structural motifs and the implementation of Graph Neural Networks (GNNs). We have tested our model across diverse mass spectra and have observed its superiority over other existing models. MoMS-Net considers substructure at the graph level, which facilitates the incorporation of long-range dependencies while using less memory compared to the graph transformer model.Comment: 19 pages, 3figure

    MassFormer: Tandem Mass Spectrum Prediction with Graph Transformers

    Full text link
    Mass spectrometry is a key tool in the study of small molecules, playing an important role in metabolomics, drug discovery, and environmental chemistry. Tandem mass spectra capture fragmentation patterns that provide key structural information about a molecule and help with its identification. Practitioners often rely on spectral library searches to match unknown spectra with known compounds. However, such search-based methods are limited by availability of reference experimental data. In this work we show that graph transformers can be used to accurately predict tandem mass spectra. Our model, MassFormer, outperforms competing deep learning approaches for spectrum prediction, and includes an interpretable attention mechanism to help explain predictions. We demonstrate that our model can be used to improve reference library coverage on a synthetic molecule identification task. Through quantitative analysis and visual inspection, we verify that our model recovers prior knowledge about the effect of collision energy on the generated spectrum. We evaluate our model on different types of mass spectra from two independent MS datasets and show that its performance generalizes. Code available at github.com/Roestlab/massformer.Comment: 14 pages (10 without bibliography), 5 figures, 3 table

    Analyzing Learned Molecular Representations for Property Prediction

    Full text link
    Advancements in neural machinery have led to a wide range of algorithmic solutions for molecular property prediction. Two classes of models in particular have yielded promising results: neural networks applied to computed molecular fingerprints or expert-crafted descriptors, and graph convolutional neural networks that construct a learned molecular representation by operating on the graph structure of the molecule. However, recent literature has yet to clearly determine which of these two methods is superior when generalizing to new chemical space. Furthermore, prior research has rarely examined these new models in industry research settings in comparison to existing employed models. In this paper, we benchmark models extensively on 19 public and 16 proprietary industrial datasets spanning a wide variety of chemical endpoints. In addition, we introduce a graph convolutional model that consistently matches or outperforms models using fixed molecular descriptors as well as previous graph neural architectures on both public and proprietary datasets. Our empirical findings indicate that while approaches based on these representations have yet to reach the level of experimental reproducibility, our proposed model nevertheless offers significant improvements over models currently used in industrial workflows

    Collision Cross Section Prediction with Molecular Fingerprint Using Machine Learning

    Get PDF
    High-resolution mass spectrometry is a promising technique in non-target screening (NTS) to monitor contaminants of emerging concern in complex samples. Current chemical identification strategies in NTS experiments typically depend on spectral libraries, chemical databases, and in silico fragmentation tools. However, small molecule identification remains challenging due to the lack of orthogonal sources of information (e.g., unique fragments). Collision cross section (CCS) values measured by ion mobility spectrometry (IMS) offer an additional identification dimension to increase the confidence level. Thanks to the advances in analytical instrumentation, an increasing application of IMS hybrid with high-resolution mass spectrometry (HRMS) in NTS has been reported in the recent decades. Several CCS prediction tools have been developed. However, limited CCS prediction methods were based on a large scale of chemical classes and cross-platform CCS measurements. We successfully developed two prediction models using a random forest machine learning algorithm. One of the approaches was based on chemicals’ super classes; the other model was direct CCS prediction using molecular fingerprint. Over 13,324 CCS values from six different laboratories and PubChem using a variety of ion-mobility separation techniques were used for training and testing the models. The test accuracy for all the prediction models was over 0.85, and the median of relative residual was around 2.2%. The models can be applied to different IMS platforms to eliminate false positives in small molecule identification

    Novel methods for the analysis of small molecule fragmentation mass spectra

    Get PDF
    The identification of small molecules, such as metabolites, in a high throughput manner plays an important in many research areas. Mass spectrometry (MS) is one of the predominant analysis technologies and is much more sensitive than nuclear magnetic resonance spectroscopy. Fragmentation of the molecules is used to obtain information beyond its mass. Gas chromatography-MS is one of the oldest and most widespread techniques for the analysis of small molecules. Commonly, the molecule is fragmented using electron ionization (EI). Using this technique, the molecular ion peak is often barely visible in the mass spectrum or even absent. We present a method to calculate fragmentation trees from high mass accuracy EI spectra, which annotate the peaks in the mass spectrum with molecular formulas of fragments and explain relevant fragmentation pathways. Fragmentation trees enable the identification of the molecular ion and its molecular formula if the molecular ion is present in the spectrum. The method works even if the molecular ion is of very low abundance. MS experts confirm that the calculated trees correspond very well to known fragmentation mechanisms.Using pairwise local alignments of fragmentation trees, structural and chemical similarities to already-known molecules can be determined. In order to compare a fragmentation tree of an unknown metabolite to a huge database of fragmentation trees, fast algorithms for solving the tree alignment problem are required. Unfortunately the alignment of unordered trees, such as fragmentation trees, is NP-hard. We present three exact algorithms for the problem. Evaluation of our methods showed that thousands of alignments can be computed in a matter of minutes. Both the computation and the comparison of fragmentation trees are rule-free approaches that require no chemical knowledge about the unknown molecule and thus will be very helpful in the automated analysis of metabolites that are not included in common libraries
    • …
    corecore