4,058 research outputs found

    Smith-Waterman peak alignment for comprehensive two-dimensional gas chromatography-mass spectrometry

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Comprehensive two-dimensional gas chromatography coupled with mass spectrometry (GC × GC-MS) is a powerful technique which has gained increasing attention over the last two decades. The GC × GC-MS provides much increased separation capacity, chemical selectivity and sensitivity for complex sample analysis and brings more accurate information about compound retention times and mass spectra. Despite these advantages, the retention times of the resolved peaks on the two-dimensional gas chromatographic columns are always shifted due to experimental variations, introducing difficulty in the data processing for metabolomics analysis. Therefore, the retention time variation must be adjusted in order to compare multiple metabolic profiles obtained from different conditions.</p> <p>Results</p> <p>We developed novel peak alignment algorithms for both homogeneous (acquired under the identical experimental conditions) and heterogeneous (acquired under the different experimental conditions) GC × GC-MS data using modified Smith-Waterman local alignment algorithms along with mass spectral similarity. Compared with literature reported algorithms, the proposed algorithms eliminated the detection of landmark peaks and the usage of retention time transformation. Furthermore, an automated peak alignment software package was established by implementing a likelihood function for optimal peak alignment.</p> <p>Conclusions</p> <p>The proposed Smith-Waterman local alignment-based algorithms are capable of aligning both the homogeneous and heterogeneous data of multiple GC × GC-MS experiments without the transformation of retention times and the selection of landmark peaks. An optimal version of the SW-based algorithms was also established based on the associated likelihood function for the automatic peak alignment. The proposed alignment algorithms outperform the literature reported alignment method by analyzing the experiment data of a mixture of compound standards and a metabolite extract of mouse plasma with spiked-in compound standards.</p

    Bayesian methods for small molecule identification

    Get PDF
    Confident identification of small molecules remains a major challenge in untargeted metabolomics, natural product research and related fields. Liquid chromatography-tandem mass spectrometry is a predominant technique for the high-throughput analysis of small molecules and can detect thousands of different compounds in a biological sample. The automated interpretation of the resulting tandem mass spectra is highly non-trivial and many studies are limited to re-discovering known compounds by searching mass spectra in spectral reference libraries. But these libraries are vastly incomplete and a large portion of measured compounds remains unidentified. This constitutes a major bottleneck in the comprehensive, high-throughput analysis of metabolomics data. In this thesis, we present two computational methods that address different steps in the identification process of small molecules from tandem mass spectra. ZODIAC is a novel method for de novo that is, database-independent molecular formula annotation in complete datasets. It exploits similarities of compounds co-occurring in a sample to find the most likely molecular formula for each individual compound. ZODIAC improves on the currently best-performing method SIRIUS; on one dataset by 16.5 fold. We show that de novo molecular formula annotation is not just a theoretical advantage: We discover multiple novel molecular formulas absent from PubChem, one of the biggest structure databases. Furthermore, we introduce a novel scoring for CSI:FingerID, a state-of-the-art method for searching tandem mass spectra in a structure database. This scoring models dependencies between different molecular properties in a predicted molecular fingerprint via Bayesian networks. This problem has the unusual property, that the marginal probabilities differ for each predicted query fingerprint. Thus, we need to apply Bayesian networks in a novel, non-standard fashion. Modeling dependencies improves on the currently best scoring

    Automated mass spectrometry-based metabolomics data processing by blind source separation methods

    Get PDF
    Una de les principals limitacions de la metabolòmica és la transformació de dades crues en informació biològica. A més, la metabolòmica basada en espectrometria de masses genera grans quantitats de dades complexes caracteritzades per la co-elució de compostos i artefactes experimentals. L'objectiu d'aquesta tesi és desenvolupar estratègies automatitzades basades en deconvolució cega del senyal per millorar les capacitats dels mètodes existents que tracten les limitacions de les diferents passes del processament de dades en metabolòmica. L'objectiu d'aquesta tesi és també desenvolupar eines capaces d'executar el flux de treball del processament de dades en metabolòmica, que inclou el preprocessament de dades, deconvolució espectral, alineament i identificació. Com a resultat, tres nous mètodes automàtics per deconvolució espectral basats en deconvolució cega del senyal van ser desenvolupats. Aquests mètodes van ser inclosos en dues eines computacionals que permeten convertir automàticament dades crues en informació biològica interpretable i per tant, permeten resoldre hipòtesis biològiques i adquirir nous coneixements biològics.Una de les principals limitacions de la metabolòmica és la transformació de dades crues en informació biològica. A més, la metabolòmica basada en espectrometria de masses genera grans quantitats de dades complexes caracteritzades per la co-elució de compostos i artefactes experimentals. L'objectiu d'aquesta tesi és desenvolupar estratègies automatitzades basades en deconvolució cega del senyal per millorar les capacitats dels mètodes existents que tracten les limitacions de les diferents passes del processament de dades en metabolòmica. L'objectiu d'aquesta tesi és també desenvolupar eines capaces d'executar el flux de treball del processament de dades en metabolòmica, que inclou el preprocessament de dades, deconvolució espectral, alineament i identificació. Com a resultat, tres nous mètodes automàtics per deconvolució espectral basats en deconvolució cega del senyal van ser desenvolupats. Aquests mètodes van ser inclosos en dues eines computacionals que permeten convertir automàticament dades crues en informació biològica interpretable i per tant, permeten resoldre hipòtesis biològiques i adquirir nous coneixements biològics.Una de las principales limitaciones de la metabolómica es la transformación de datos crudos en información biológica. Además, la metabolómica basada en espectrometría de masas genera grandes cantidades de datos complejos caracterizados por la co-elución de compuestos y artefactos experimentales. El objetivo de esta tesis es desarrollar estrategias automatizadas basadas en deconvolución ciega de la señal para mejorar las capacidades de los métodos existentes que tratan las limitaciones de los diferentes pasos del procesamiento de datos en metabolómica. El objetivo de esta tesis es también desarrollar herramientas capaces de ejecutar el flujo de trabajo del procesamiento de datos en metabolómica, que incluye el preprocessamiento de datos, deconvolución espectral, alineamiento e identificación. Como resultado, tres nuevos métodos automáticos para deconvolución espectral basados en deconvolución ciega de la señal fueron desarrollados. Estos métodos fueron incluidos en dos herramientas computacionales que permiten convertir automáticamente datos crudos en información biológica interpretable y por lo tanto, permiten resolver hipótesis biológicas y adquirir nuevos conocimientos biológicos.One of the major bottlenecks in metabolomics is to convert raw data samples into biological interpretable information. Moreover, mass spectrometry-based metabolomics generates large and complex datasets characterized by co-eluting compounds and with experimental artifacts. This thesis main objective is to develop automated strategies based on blind source separation to improve the capabilities of the current methods that tackle the different metabolomics data processing workflow steps limitations. Also, the objective of this thesis is to develop tools capable of performing the entire metabolomics workflow for GC--MS, including pre-processing, spectral deconvolution, alignment and identification. As a result, three new automated methods for spectral deconvolution based on blind source separation were developed. These methods were embedded into two computation tools able to automatedly convert raw data into biological interpretable information and thus, allow resolving biological answers and discovering new biological insights

    DATA ANALYSIS WORKFLOW FOR GAS CHROMATOGRAPHY MASS SPECTROMETRY-BASED METABOLOMICS STUDIES

    Get PDF
    Metabolomics has emerged as an integral part of systems biology research that attempts to comprehensively study low molecular weight organic and inorganic metabolites under certain conditions within a biological system. Technological advances in the past decade have made it possible to carry out metabolomics studies in a high- throughput fashion using gas chromatography coupled with mass spectrometry. As a result, large volumes of data are produced from these studies and there is a pressing need for algorithms that can efficiently process and analyze the data in a high-throughput fashion as well. To address this need, we have developed computational algorithms and the associated software tool named an Automated Data Analysis Pipeline (ADAP). ADAP allows data to flow seamlessly through the data processing steps that include de- nosing, peak detection, deconvolution, alignment, compound identification and quantitation. The development of ADAP started in 2009 and the past four years have witnessed continuous improvements in its performance from ADAP-GC 1.0, to ADAP- GC 2.0, and to the current ADAP-GC 3.0. As part of the performance assessment of ADAP-GC, we have compared it with three other software tools. In this dissertation, I will present the computational details about these three versions of ADAP-GC, the capabilities of the software tool, and the results from software comparison

    The metaRbolomics Toolbox in Bioconductor and beyond

    Get PDF
    Metabolomics aims to measure and characterise the complex composition of metabolites in a biological system. Metabolomics studies involve sophisticated analytical techniques such as mass spectrometry and nuclear magnetic resonance spectroscopy, and generate large amounts of high-dimensional and complex experimental data. Open source processing and analysis tools are of major interest in light of innovative, open and reproducible science. The scientific community has developed a wide range of open source software, providing freely available advanced processing and analysis approaches. The programming and statistics environment R has emerged as one of the most popular environments to process and analyse Metabolomics datasets. A major benefit of such an environment is the possibility of connecting different tools into more complex workflows. Combining reusable data processing R scripts with the experimental data thus allows for open, reproducible research. This review provides an extensive overview of existing packages in R for different steps in a typical computational metabolomics workflow, including data processing, biostatistics, metabolite annotation and identification, and biochemical network and pathway analysis. Multifunctional workflows, possible user interfaces and integration into workflow management systems are also reviewed. In total, this review summarises more than two hundred metabolomics specific packages primarily available on CRAN, Bioconductor and GitHub

    Peak Alignment of Gas Chromatography-Mass Spectrometry Data with Deep Learning

    Full text link
    We present ChromAlignNet, a deep learning model for alignment of peaks in Gas Chromatography-Mass Spectrometry (GC-MS) data. In GC-MS data, a compound's retention time (RT) may not stay fixed across multiple chromatograms. To use GC-MS data for biomarker discovery requires alignment of identical analyte's RT from different samples. Current methods of alignment are all based on a set of formal, mathematical rules. We present a solution to GC-MS alignment using deep learning neural networks, which are more adept at complex, fuzzy data sets. We tested our model on several GC-MS data sets of various complexities and analysed the alignment results quantitatively. We show the model has very good performance (AUC 1\sim 1 for simple data sets and AUC 0.85\sim 0.85 for very complex data sets). Further, our model easily outperforms existing algorithms on complex data sets. Compared with existing methods, ChromAlignNet is very easy to use as it requires no user input of reference chromatograms and parameters. This method can easily be adapted to other similar data such as those from liquid chromatography. The source code is written in Python and available online
    corecore