1,596 research outputs found
Software Tools and Approaches for Compound Identification of LC-MS/MS Data in Metabolomics.
The annotation of small molecules remains a major challenge in untargeted mass spectrometry-based metabolomics. We here critically discuss structured elucidation approaches and software that are designed to help during the annotation of unknown compounds. Only by elucidating unknown metabolites first is it possible to biologically interpret complex systems, to map compounds to pathways and to create reliable predictive metabolic models for translational and clinical research. These strategies include the construction and quality of tandem mass spectral databases such as the coalition of MassBank repositories and investigations of MS/MS matching confidence. We present in silico fragmentation tools such as MS-FINDER, CFM-ID, MetFrag, ChemDistiller and CSI:FingerID that can annotate compounds from existing structure databases and that have been used in the CASMI (critical assessment of small molecule identification) contests. Furthermore, the use of retention time models from liquid chromatography and the utility of collision cross-section modelling from ion mobility experiments are covered. Workflows and published examples of successfully annotated unknown compounds are included
Rapid prediction of NMR spectral properties with quantified uncertainty
open access articleAccurate calculation of specific spectral properties for NMR is an important step for molecular structure elucidation. Here we report the development of a novel machine learning technique for accurately predicting chemical shifts of both 1H and 13C nuclei which exceeds DFT-accessible accuracy for 13C and 1H for a subset of nuclei, while being orders of magnitude more performant. Our method produces estimates of uncertainty, allowing for robust and confident predictions, and suggests future avenues for improved performance
Mass Spectra Prediction with Structural Motif-based Graph Neural Networks
Mass spectra, which are agglomerations of ionized fragments from targeted
molecules, play a crucial role across various fields for the identification of
molecular structures. A prevalent analysis method involves spectral library
searches,where unknown spectra are cross-referenced with a database. The
effectiveness of such search-based approaches, however, is restricted by the
scope of the existing mass spectra database, underscoring the need to expand
the database via mass spectra prediction. In this research, we propose the
Motif-based Mass Spectrum Prediction Network (MoMS-Net), a system that predicts
mass spectra using the information derived from structural motifs and the
implementation of Graph Neural Networks (GNNs). We have tested our model across
diverse mass spectra and have observed its superiority over other existing
models. MoMS-Net considers substructure at the graph level, which facilitates
the incorporation of long-range dependencies while using less memory compared
to the graph transformer model.Comment: 19 pages, 3figure
MassFormer: Tandem Mass Spectrum Prediction with Graph Transformers
Mass spectrometry is a key tool in the study of small molecules, playing an
important role in metabolomics, drug discovery, and environmental chemistry.
Tandem mass spectra capture fragmentation patterns that provide key structural
information about a molecule and help with its identification. Practitioners
often rely on spectral library searches to match unknown spectra with known
compounds. However, such search-based methods are limited by availability of
reference experimental data. In this work we show that graph transformers can
be used to accurately predict tandem mass spectra. Our model, MassFormer,
outperforms competing deep learning approaches for spectrum prediction, and
includes an interpretable attention mechanism to help explain predictions. We
demonstrate that our model can be used to improve reference library coverage on
a synthetic molecule identification task. Through quantitative analysis and
visual inspection, we verify that our model recovers prior knowledge about the
effect of collision energy on the generated spectrum. We evaluate our model on
different types of mass spectra from two independent MS datasets and show that
its performance generalizes. Code available at github.com/Roestlab/massformer.Comment: 14 pages (10 without bibliography), 5 figures, 3 table
Analyzing Learned Molecular Representations for Property Prediction
Advancements in neural machinery have led to a wide range of algorithmic
solutions for molecular property prediction. Two classes of models in
particular have yielded promising results: neural networks applied to computed
molecular fingerprints or expert-crafted descriptors, and graph convolutional
neural networks that construct a learned molecular representation by operating
on the graph structure of the molecule. However, recent literature has yet to
clearly determine which of these two methods is superior when generalizing to
new chemical space. Furthermore, prior research has rarely examined these new
models in industry research settings in comparison to existing employed models.
In this paper, we benchmark models extensively on 19 public and 16 proprietary
industrial datasets spanning a wide variety of chemical endpoints. In addition,
we introduce a graph convolutional model that consistently matches or
outperforms models using fixed molecular descriptors as well as previous graph
neural architectures on both public and proprietary datasets. Our empirical
findings indicate that while approaches based on these representations have yet
to reach the level of experimental reproducibility, our proposed model
nevertheless offers significant improvements over models currently used in
industrial workflows
Collision Cross Section Prediction with Molecular Fingerprint Using Machine Learning
High-resolution mass spectrometry is a promising technique in non-target screening (NTS) to monitor contaminants of emerging concern in complex samples. Current chemical identification strategies in NTS experiments typically depend on spectral libraries, chemical databases, and in silico fragmentation tools. However, small molecule identification remains challenging due to the lack of orthogonal sources of information (e.g., unique fragments). Collision cross section (CCS) values measured by ion mobility spectrometry (IMS) offer an additional identification dimension to increase the confidence level. Thanks to the advances in analytical instrumentation, an increasing application of IMS hybrid with high-resolution mass spectrometry (HRMS) in NTS has been reported in the recent decades. Several CCS prediction tools have been developed. However, limited CCS prediction methods were based on a large scale of chemical classes and cross-platform CCS measurements. We successfully developed two prediction models using a random forest machine learning algorithm. One of the approaches was based on chemicals’ super classes; the other model was direct CCS prediction using molecular fingerprint. Over 13,324 CCS values from six different laboratories and PubChem using a variety of ion-mobility separation techniques were used for training and testing the models. The test accuracy for all the prediction models was over 0.85, and the median of relative residual was around 2.2%. The models can be applied to different IMS platforms to eliminate false positives in small molecule identification
Novel methods for the analysis of small molecule fragmentation mass spectra
The identification of small molecules, such as metabolites, in a high throughput manner plays an important in many research areas. Mass spectrometry (MS) is one of the predominant analysis technologies and is much more sensitive than nuclear magnetic resonance spectroscopy. Fragmentation of the molecules is used to obtain information beyond its mass. Gas chromatography-MS is one of the oldest and most widespread techniques for the analysis of small molecules. Commonly, the molecule is fragmented using electron ionization (EI). Using this technique, the molecular ion peak is often barely visible in the mass spectrum or even absent. We present a method to calculate fragmentation trees from high mass accuracy EI spectra, which annotate the peaks in the mass spectrum with molecular formulas of fragments and explain relevant fragmentation pathways. Fragmentation trees enable the identification of the molecular ion and its molecular formula if the molecular ion is present in the spectrum. The method works even if the molecular ion is of very low abundance. MS experts confirm that the calculated trees correspond very well to known fragmentation mechanisms.Using pairwise local alignments of fragmentation trees, structural and chemical similarities to already-known molecules can be determined. In order to compare a fragmentation tree of an unknown metabolite to a huge database of fragmentation trees, fast algorithms for solving the tree alignment problem are required. Unfortunately the alignment of unordered trees, such as fragmentation trees, is NP-hard. We present three exact algorithms for the problem. Evaluation of our methods showed that thousands of alignments can be computed in a matter of minutes.
Both the computation and the comparison of fragmentation trees are rule-free approaches that require no chemical knowledge about the unknown molecule and thus will be very helpful in the automated analysis of metabolites that are not included in common libraries
- …