11,769 research outputs found

    MetAssign: probabilistic annotation of metabolites from LC–MS data using a Bayesian clustering approach

    Get PDF
    Motivation: The use of liquid chromatography coupled to mass spectrometry (LC–MS) has enabled the high-throughput profiling of the metabolite composition of biological samples. However, the large amount of data obtained can be difficult to analyse and often requires computational processing to understand which metabolites are present in a sample. This paper looks at the dual problem of annotating peaks in a sample with a metabolite, together with putatively annotating whether a metabolite is present in the sample. The starting point of the approach is a Bayesian clustering of peaks into groups, each corresponding to putative adducts and isotopes of a single metabolite.<p></p> Results: The Bayesian modelling introduced here combines information from the mass-to-charge ratio, retention time and intensity of each peak, together with a model of the inter-peak dependency structure, to increase the accuracy of peak annotation. The results inherently contain a quantitative estimate of confidence in the peak annotations and allow an accurate trade off between precision and recall. Extensive validation experiments using authentic chemical standards show that this system is able to produce more accurate putative identifications than other state-of-the-art systems, while at the same time giving a probabilistic measure of confidence in the annotations.<p></p> Availability: The software has been implemented as part of the mzMatch metabolomics analysis pipeline, which is available for download at http://mzmatch.sourceforge.net/

    Bayesian methods for small molecule identification

    Get PDF
    Confident identification of small molecules remains a major challenge in untargeted metabolomics, natural product research and related fields. Liquid chromatography-tandem mass spectrometry is a predominant technique for the high-throughput analysis of small molecules and can detect thousands of different compounds in a biological sample. The automated interpretation of the resulting tandem mass spectra is highly non-trivial and many studies are limited to re-discovering known compounds by searching mass spectra in spectral reference libraries. But these libraries are vastly incomplete and a large portion of measured compounds remains unidentified. This constitutes a major bottleneck in the comprehensive, high-throughput analysis of metabolomics data. In this thesis, we present two computational methods that address different steps in the identification process of small molecules from tandem mass spectra. ZODIAC is a novel method for de novo that is, database-independent molecular formula annotation in complete datasets. It exploits similarities of compounds co-occurring in a sample to find the most likely molecular formula for each individual compound. ZODIAC improves on the currently best-performing method SIRIUS; on one dataset by 16.5 fold. We show that de novo molecular formula annotation is not just a theoretical advantage: We discover multiple novel molecular formulas absent from PubChem, one of the biggest structure databases. Furthermore, we introduce a novel scoring for CSI:FingerID, a state-of-the-art method for searching tandem mass spectra in a structure database. This scoring models dependencies between different molecular properties in a predicted molecular fingerprint via Bayesian networks. This problem has the unusual property, that the marginal probabilities differ for each predicted query fingerprint. Thus, we need to apply Bayesian networks in a novel, non-standard fashion. Modeling dependencies improves on the currently best scoring

    Smith-Waterman peak alignment for comprehensive two-dimensional gas chromatography-mass spectrometry

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Comprehensive two-dimensional gas chromatography coupled with mass spectrometry (GC × GC-MS) is a powerful technique which has gained increasing attention over the last two decades. The GC × GC-MS provides much increased separation capacity, chemical selectivity and sensitivity for complex sample analysis and brings more accurate information about compound retention times and mass spectra. Despite these advantages, the retention times of the resolved peaks on the two-dimensional gas chromatographic columns are always shifted due to experimental variations, introducing difficulty in the data processing for metabolomics analysis. Therefore, the retention time variation must be adjusted in order to compare multiple metabolic profiles obtained from different conditions.</p> <p>Results</p> <p>We developed novel peak alignment algorithms for both homogeneous (acquired under the identical experimental conditions) and heterogeneous (acquired under the different experimental conditions) GC × GC-MS data using modified Smith-Waterman local alignment algorithms along with mass spectral similarity. Compared with literature reported algorithms, the proposed algorithms eliminated the detection of landmark peaks and the usage of retention time transformation. Furthermore, an automated peak alignment software package was established by implementing a likelihood function for optimal peak alignment.</p> <p>Conclusions</p> <p>The proposed Smith-Waterman local alignment-based algorithms are capable of aligning both the homogeneous and heterogeneous data of multiple GC × GC-MS experiments without the transformation of retention times and the selection of landmark peaks. An optimal version of the SW-based algorithms was also established based on the associated likelihood function for the automatic peak alignment. The proposed alignment algorithms outperform the literature reported alignment method by analyzing the experiment data of a mixture of compound standards and a metabolite extract of mouse plasma with spiked-in compound standards.</p

    Evidential Label Propagation Algorithm for Graphs

    Get PDF
    Community detection has attracted considerable attention crossing many areas as it can be used for discovering the structure and features of complex networks. With the increasing size of social networks in real world, community detection approaches should be fast and accurate. The Label Propagation Algorithm (LPA) is known to be one of the near-linear solutions and benefits of easy implementation, thus it forms a good basis for efficient community detection methods. In this paper, we extend the update rule and propagation criterion of LPA in the framework of belief functions. A new community detection approach, called Evidential Label Propagation (ELP), is proposed as an enhanced version of conventional LPA. The node influence is first defined to guide the propagation process. The plausibility is used to determine the domain label of each node. The update order of nodes is discussed to improve the robustness of the method. ELP algorithm will converge after the domain labels of all the nodes become unchanged. The mass assignments are calculated finally as memberships of nodes. The overlapping nodes and outliers can be detected simultaneously through the proposed method. The experimental results demonstrate the effectiveness of ELP.Comment: 19th International Conference on Information Fusion, Jul 2016, Heidelber, Franc

    Peak annotation and data analysis software tools for mass spectrometry imaging

    Get PDF
    La metabolòmica espacial és la disciplina que estudia les imatges de les distribucions de compostos químics de baix pes (metabòlits) a la superfície dels teixits biològics per revelar interaccions entre molècules. La imatge d'espectrometria de masses (MSI) és actualment la tècnica principal per obtenir informació d'imatges moleculars per a la metabolòmica espacial. MSI és una tecnologia d'imatges moleculars sense marcador que produeix espectres de masses que conserven les estructures espacials de les mostres de teixit. Això s'aconsegueix ionitzant petites porcions d'una mostra (un píxel) en un ràster definit a través de tota la seva superfície, cosa que dona com a resultat una col·lecció d'imatges de distribució de ions (registrades com a relacions massa-càrrega (m/z)) sobre la mostra. Aquesta tesi té com a objectius desenvolupar eines computacionals per a l'anotació de pics de MSI i el disseny de fluxos de treball per a l'anàlisi estadística i multivariant de dades MSI, inclosa la segmentació espacial. El treball realitzat en aquesta tesi es pot separar clarament en dues parts. En primer lloc, el desenvolupament d'una eina d'anotació de pics d'isòtops i adductes adequada per facilitar la identificació de compostos de rang de massa baix. Ara podem trobar fàcilment ions monoisotòpics als nostres conjunts de dades MSI gràcies al paquet de programari rMSIannotation. En segon lloc, el desenvolupament de eines de programari per a l’anàlisi de dades i la segmentació espacial basades en soft clustering per a dades MSI.La metabolómica espacial es la disciplina que estudia las imágenes de las distribuciones de compuestos químicos de bajo peso (metabolitos) en la superficie de los tejidos biológicos para revelar interacciones entre moléculas. Las imágenes de espectrometría de masas (MSI) es actualmente la principal técnica para obtener información de imágenes moleculares para la metabolómica espacial. MSI es una tecnología de imágenes moleculares sin marcador que produce espectros de masas que conservan las estructuras espaciales de las muestras de tejido. Esto se logra ionizando pequeñas porciones de una muestra (un píxel) en un ráster definido a través de toda su superficie, lo que da como resultado una colección de imágenes de distribución de iones (registradas como relaciones masa-carga (m/z)) sobre la muestra. Esta tesis tiene como objetivo desarrollar herramientas computacionales para la anotación de picos en MSI y en el diseño de flujos de trabajo para el análisis estadístico y multivariado de datos MSI, incluida la segmentación espacial. El trabajo realizado en esta tesis se puede separar claramente en dos partes. En primer lugar, el desarrollo de una herramienta de anotación de picos de isótopos y aductos adecuada para facilitar la identificación de compuestos de bajo rango de masa. Ahora podemos encontrar fácilmente iones monoisotópicos en nuestros conjuntos de datos MSI gracias al paquete de software rMSIannotation.Spatial metabolomics is the discipline that studies the images of the distributions of low weight chemical compounds (metabolites) on the surface of biological tissues to unveil interactions between molecules. Mass spectrometry imaging (MSI) is currently the principal technique to get molecular imaging information for spatial metabolomics. MSI is a labelfree molecular imaging technology that produces mass spectra preserving the spatial structures of tissue samples. This is achieved by ionizing small portions of a sample (a pixel) in a defined raster through all its surface, which results in a collection of ion distribution images (registered as mass-to-charge ratios (m/z)) over the sample. This thesis is aimed to develop computational tools for peak annotation in MSI and in the design of workflows for the statistical and multivariate analysis of MSI data, including spatial segmentation. The work carried out in this thesis can be clearly separated in two parts. Firstly, the development of an isotope and adduct peak annotation tool suited to facilitate the identification of the low mass range compounds. We can now easily find monoisotopic ions in our MSI datasets thanks to the rMSIannotation software package. Secondly, the development of software tools for data analysis and spatial segmentation based on soft clustering for MSI data. In this thesis, we have developed tools and methodologies to search for significant ions (rMSIKeyIon software package) and for the soft clustering of tissues (Fuzzy c-means algorithm)

    Seeing the forest for the trees : retrieving plant secondary biochemical pathways from metabolome networks

    Get PDF
    Over the last decade, a giant leap forward has been made in resolving the main bottleneck in metabolomics, i.e., the structural characterization of the many unknowns. This has led to the next challenge in this research field: retrieving biochemical pathway information from the various types of networks that can be constructed from metabolome data. Searching putative biochemical pathways, referred to as biotransformation paths, is complicated because several flaws occur during the construction of metabolome networks. Multiple network analysis tools have been developed to deal with these flaws, while in silico retrosynthesis is appearing as an alternative approach. In this review, the different types of metabolome networks, their flaws, and the various tools to trace these biotransformation paths are discussed