    Automated mass spectrometry-based metabolomics data processing by blind source separation methods

    Una de les principals limitacions de la metabolòmica és la transformació de dades crues en informació biològica. A més, la metabolòmica basada en espectrometria de masses genera grans quantitats de dades complexes caracteritzades per la co-elució de compostos i artefactes experimentals. L'objectiu d'aquesta tesi és desenvolupar estratègies automatitzades basades en deconvolució cega del senyal per millorar les capacitats dels mètodes existents que tracten les limitacions de les diferents passes del processament de dades en metabolòmica. L'objectiu d'aquesta tesi és també desenvolupar eines capaces d'executar el flux de treball del processament de dades en metabolòmica, que inclou el preprocessament de dades, deconvolució espectral, alineament i identificació. Com a resultat, tres nous mètodes automàtics per deconvolució espectral basats en deconvolució cega del senyal van ser desenvolupats. Aquests mètodes van ser inclosos en dues eines computacionals que permeten convertir automàticament dades crues en informació biològica interpretable i per tant, permeten resoldre hipòtesis biològiques i adquirir nous coneixements biològics.Una de les principals limitacions de la metabolòmica és la transformació de dades crues en informació biològica. A més, la metabolòmica basada en espectrometria de masses genera grans quantitats de dades complexes caracteritzades per la co-elució de compostos i artefactes experimentals. L'objectiu d'aquesta tesi és desenvolupar estratègies automatitzades basades en deconvolució cega del senyal per millorar les capacitats dels mètodes existents que tracten les limitacions de les diferents passes del processament de dades en metabolòmica. L'objectiu d'aquesta tesi és també desenvolupar eines capaces d'executar el flux de treball del processament de dades en metabolòmica, que inclou el preprocessament de dades, deconvolució espectral, alineament i identificació. Com a resultat, tres nous mètodes automàtics per deconvolució espectral basats en deconvolució cega del senyal van ser desenvolupats. Aquests mètodes van ser inclosos en dues eines computacionals que permeten convertir automàticament dades crues en informació biològica interpretable i per tant, permeten resoldre hipòtesis biològiques i adquirir nous coneixements biològics.Una de las principales limitaciones de la metabolómica es la transformación de datos crudos en información biológica. Además, la metabolómica basada en espectrometría de masas genera grandes cantidades de datos complejos caracterizados por la co-elución de compuestos y artefactos experimentales. El objetivo de esta tesis es desarrollar estrategias automatizadas basadas en deconvolución ciega de la señal para mejorar las capacidades de los métodos existentes que tratan las limitaciones de los diferentes pasos del procesamiento de datos en metabolómica. El objetivo de esta tesis es también desarrollar herramientas capaces de ejecutar el flujo de trabajo del procesamiento de datos en metabolómica, que incluye el preprocessamiento de datos, deconvolución espectral, alineamiento e identificación. Como resultado, tres nuevos métodos automáticos para deconvolución espectral basados en deconvolución ciega de la señal fueron desarrollados. Estos métodos fueron incluidos en dos herramientas computacionales que permiten convertir automáticamente datos crudos en información biológica interpretable y por lo tanto, permiten resolver hipótesis biológicas y adquirir nuevos conocimientos biológicos.One of the major bottlenecks in metabolomics is to convert raw data samples into biological interpretable information. Moreover, mass spectrometry-based metabolomics generates large and complex datasets characterized by co-eluting compounds and with experimental artifacts. This thesis main objective is to develop automated strategies based on blind source separation to improve the capabilities of the current methods that tackle the different metabolomics data processing workflow steps limitations. Also, the objective of this thesis is to develop tools capable of performing the entire metabolomics workflow for GC--MS, including pre-processing, spectral deconvolution, alignment and identification. As a result, three new automated methods for spectral deconvolution based on blind source separation were developed. These methods were embedded into two computation tools able to automatedly convert raw data into biological interpretable information and thus, allow resolving biological answers and discovering new biological insights

    Automated resolution of chromatographic signals by independent component analysis-orthogonal signal deconvolution in comprehensive gas chromatography/mass spectrometry-based metabolomics

    Comprehensive gas chromatography-mass spectrometry (GC x GC-MS) provides a different perspective in metabolomics profiling of samples. However, algorithms for GCx GC-MS data processing are needed in order to automatically process the data and extract the purest information about the compounds appearing in complex biological samples. This study shows the capability of independent component analysis-orthogonal signal deconvolution (ICA-OSD), an algorithm based on blind source separation and distributed in an R package called osd, to extract the spectra of the compounds appearing in GCx GC-MS chromatograms in an automated manner. We studied the performance of ICA-OSD by the quantification of 38 metabolites through a set of 20 Jurkat cell samples analyzed by GCx GC-MS. The quantification by ICA-OSD was compared with a supervised quantification by selective ions, and most of the R2 coefficients of determination were in good agreement (R-2>0.90) while up to 24 cases exhibited an excellent linear relation (R-2>0.95). We concluded that ICA-OSD can be used to resolve co-eluted compounds in GC x GC-MS. (C) 2016 Elsevier Ireland Ltd. All rights reserved.Postprint (author's final draft

    Interpretation of comprehensive two-dimensional gas chromatography data using advanced chemometrics

    The power of comprehensive two-dimensional gas chromatography (GC × GC) for the study of complex mixtures has been indisputably proved in the past several decades. This review encompasses the whole of GC × GC-related data processing and summarizes relevant applications. We include theoretical introduction to some specific methods and studies to aid readers' understanding of chemometrics strategies for advanced data interpretation

    Metabolomics : a tool for studying plant biology

    In recent years new technologies have allowed gene expression, protein and metabolite profiles in different tissues and developmental stages to be monitored. This is an emerging field in plant science and is applied to diverse plant systems in order to elucidate the regulation of growth and development. The goal in plant metabolomics is to analyze, identify and quantify all low molecular weight molecules of plant organisms. The plant metabolites are extracted and analyzed using various sensitive analytical techniques, usually mass spectrometry (MS) in combination with chromatography. In order to compare the metabolome of different plants in a high through-put manner, a number of biological, analytical and data processing steps have to be performed. In the work underlying this thesis we developed a fast and robust method for routine analysis of plant metabolite patterns using Gas Chromatography-Mass Spectrometry (GC/MS). The method was performed according to Design of Experiment (DOE) to investigate factors affecting the extraction and derivatization of the metabolites from leaves of the plant Arabidopsis thaliana. The outcome of metabolic analysis by GC/MS is a complex mixture of approximately 400 overlapping peaks. Resolving (deconvoluting) overlapping peaks is time-consuming, difficult to automate and additional processing is needed in order to compare samples. To avoid deconvolution being a major bottleneck in high through-put analyses we developed a new semi-automated strategy using hierarchical methods for processing GC/MS data that can be applied to all samples simultaneously. The two methods include base-line correction of the non-processed MS-data files, alignment, time-window determinations, Alternating Regression and multivariate analysis in order to detect metabolites that differ in relative concentrations between samples. The developed methodology was applied to study the effects of the plant hormone GA on the metabolome, with specific emphasis on auxin levels in Arabidopsis thaliana mutants defective in GA biosynthesis and signalling. A large series of plant samples was analysed and the resulting data were processed in less than one week with minimal labour; similar to the time required for the GC/MS analyses of the samples

    Wine science in the metabolomic era: wine-omics research

    Las figuras y tablas que contiene el documento se localizan al final del mismo.Metabolomics approaches have proved valuable in a wide range of areas of knowledge. This review covers the latest advances in the past five years concerning wine chemistry, thanks to the development of metabolomics approaches. The combination of powerful, robust analytical techniques (NMR, LC-MS, GC-MS, FTICR, UHPLC, and CE) provides high-dimensional data that require advanced chemometric tools in order to handle these datasets appropriately and to assess the chemical composition holistically. Metabolomics studies offer the analysis of as many metabolites as possible to carry out unbiased discrimination and/or classification according to variety, origin, vintage and quality and to enable integration of all time-related metabolic changes of wine history throughout its elaborate processing to assure wine authentication and to preclude adulteration.The authors are grateful to the Spanish Ministry of Economy and Competitiveness (MINECO) (Project AGL2012-04172-C02-01) and the Comunidad Autónoma of Madrid (Spain) and European funding from FEDER program (Project S2013/ABI-3028, AVANSECAL-CM) for financial support. M.E. Alañón would like to thank Fundación Alfonso Martín Escudero for the post-doctoral fellowship awarde

    Software Tools and Approaches for Compound Identification of LC-MS/MS Data in Metabolomics.

    The annotation of small molecules remains a major challenge in untargeted mass spectrometry-based metabolomics. We here critically discuss structured elucidation approaches and software that are designed to help during the annotation of unknown compounds. Only by elucidating unknown metabolites first is it possible to biologically interpret complex systems, to map compounds to pathways and to create reliable predictive metabolic models for translational and clinical research. These strategies include the construction and quality of tandem mass spectral databases such as the coalition of MassBank repositories and investigations of MS/MS matching confidence. We present in silico fragmentation tools such as MS-FINDER, CFM-ID, MetFrag, ChemDistiller and CSI:FingerID that can annotate compounds from existing structure databases and that have been used in the CASMI (critical assessment of small molecule identification) contests. Furthermore, the use of retention time models from liquid chromatography and the utility of collision cross-section modelling from ion mobility experiments are covered. Workflows and published examples of successfully annotated unknown compounds are included

    Advances in structure elucidation of small molecules using mass spectrometry

    The structural elucidation of small molecules using mass spectrometry plays an important role in modern life sciences and bioanalytical approaches. This review covers different soft and hard ionization techniques and figures of merit for modern mass spectrometers, such as mass resolving power, mass accuracy, isotopic abundance accuracy, accurate mass multiple-stage MS(n) capability, as well as hybrid mass spectrometric and orthogonal chromatographic approaches. The latter part discusses mass spectral data handling strategies, which includes background and noise subtraction, adduct formation and detection, charge state determination, accurate mass measurements, elemental composition determinations, and complex data-dependent setups with ion maps and ion trees. The importance of mass spectral library search algorithms for tandem mass spectra and multiple-stage MS(n) mass spectra as well as mass spectral tree libraries that combine multiple-stage mass spectra are outlined. The successive chapter discusses mass spectral fragmentation pathways, biotransformation reactions and drug metabolism studies, the mass spectral simulation and generation of in silico mass spectra, expert systems for mass spectral interpretation, and the use of computational chemistry to explain gas-phase phenomena. A single chapter discusses data handling for hyphenated approaches including mass spectral deconvolution for clean mass spectra, cheminformatics approaches and structure retention relationships, and retention index predictions for gas and liquid chromatography. The last section reviews the current state of electronic data sharing of mass spectra and discusses the importance of software development for the advancement of structure elucidation of small molecules

    The metaRbolomics Toolbox in Bioconductor and beyond

    Metabolomics aims to measure and characterise the complex composition of metabolites in a biological system. Metabolomics studies involve sophisticated analytical techniques such as mass spectrometry and nuclear magnetic resonance spectroscopy, and generate large amounts of high-dimensional and complex experimental data. Open source processing and analysis tools are of major interest in light of innovative, open and reproducible science. The scientific community has developed a wide range of open source software, providing freely available advanced processing and analysis approaches. The programming and statistics environment R has emerged as one of the most popular environments to process and analyse Metabolomics datasets. A major benefit of such an environment is the possibility of connecting different tools into more complex workflows. Combining reusable data processing R scripts with the experimental data thus allows for open, reproducible research. This review provides an extensive overview of existing packages in R for different steps in a typical computational metabolomics workflow, including data processing, biostatistics, metabolite annotation and identification, and biochemical network and pathway analysis. Multifunctional workflows, possible user interfaces and integration into workflow management systems are also reviewed. In total, this review summarises more than two hundred metabolomics specific packages primarily available on CRAN, Bioconductor and GitHub

    Updates in metabolomics tools and resources: 2014-2015

    Data processing and interpretation represent the most challenging and time-consuming steps in high-throughput metabolomic experiments, regardless of the analytical platforms (MS or NMR spectroscopy based) used for data acquisition. Improved machinery in metabolomics generates increasingly complex datasets that create the need for more and better processing and analysis software and in silico approaches to understand the resulting data. However, a comprehensive source of information describing the utility of the most recently developed and released metabolomics resources—in the form of tools, software, and databases—is currently lacking. Thus, here we provide an overview of freely-available, and open-source, tools, algorithms, and frameworks to make both upcoming and established metabolomics researchers aware of the recent developments in an attempt to advance and facilitate data processing workflows in their metabolomics research. The major topics include tools and researches for data processing, data annotation, and data visualization in MS and NMR-based metabolomics. Most in this review described tools are dedicated to untargeted metabolomics workflows; however, some more specialist tools are described as well. All tools and resources described including their analytical and computational platform dependencies are summarized in an overview Table