1,022 research outputs found

    Robust automatic assignment of nuclear magnetic resonance spectra for small molecules

    Get PDF
    Abstract. In this document we describe a fully automatic assignment system for Nuclear Magnetic Resonance (NMR) for small molecules. This system has 3 main features: 1. it uses as input raw NMR data. Which means it should be able to extract from them the information that is useful while ignores the noise; 2. it assigns the signals to atoms in the structure, and associates to each assignment a confidence value, which is used to sort all possible solutions; 3. it does not depend on chemical shifts predictions. So it can use the connectivity information observed in 2D NMR spectra and integrals to perform an assignment(coupling constants are also a possibility, but were not explored in this work). However, the system can use chemical shifts if available.; 4. it can learn in an unsupervised fashion, the relation between configurations of atoms and chemical shifts while solving assignment problems, which allows the system to improve while working. Analogous to the way a human works. This system is completely open source, as well as the data used in this work.En este trabajo describimos un sistema completamente automático de asignación de espectros de Resonancia Magnética Nuclear(RMN) para moléculas pequeñas. Este sistema tiene la siguientes características: 1. usa como entrada datos de RMN crudos. Lo que significa que debe ser capaz de extraer de ellos, la información que es útil y dejar de lado el ruido; 2. asigna las señales a átomos en la estructura, y asocia a cada asignación un valor de confianza, que es usado para ordenar todas las posibles soluciones; 3. no depende de predicciones de desplazamientos químicos, de forma que puede usar solo la información de conectividad observada en los espectros de RMN 2D y las integrales( las constantes de acople también son una posibilidad, pero no fueron exploradas en este trabajo). Sin embargo el sistema puede usar los desplazamientos químicos si están disponibles; 4. puede aprender de forma no supervisada, la relación entre configuraciones de átomos y desplazamientos químicos mientras resuelve problemas de asignación, lo que le permite mejorar mientras trabaja, de forma análoga a como lo hace un humano. Este sistema es completamente de código abierto, al igual que los datos que se usaron en este trabajo.Doctorad

    The roles of divergence and hybridization in shaping patterns of genetic and phenotypic variation across the evolutionary continuum in Juniperus and Piper

    Get PDF
    Genetic and phenotypic variation across populations, species, and radiations mediates the form and outcome of biotic and abiotic interactions and represents a major axis of biodiversity. Resolving patterns of variation across shallow and deep evolutionary divergences can provide key insights into the processes that generate and maintain this variation over micro- and macroevolutionary timescales. Additionally, variation in functional traits that interface with the biotic and abiotic environments plays an important role in adaptive evolution, and can shed light on the drivers of differentiation and diversification. Here, I analyzed genome-scale variation spanning individuals, populations, and species to 1) resolve complex diversification histories, 2) characterize landscape patterns of hybrid admixture and plant secondary chemistry, and 3) characterize macroevolutionary patterns of plant secondary chemistry. First, I reconstructed the evolutionary history of the serrate juniper clade of North America (Juniperus) as it diversified into arid habitats of the western United States and Mexico. Second, I examined how admixture across the species boundary influences patterns of genetic and phytochemical variation following secondary contact among three serrate juniper species. Finally, I resolve the timing and tempo of diversification in the Radula clade of Piper to understand how secondary chemistry evolves within a diverse tropical plant radiation. My work demonstrates the importance of evolutionary processes occurring along the evolutionary continuum for generating contemporary patterns of variation and diversity

    Improvement of sample classification and metabolite profiling in 1H-NMR by a machine learning-based modelling of signal parameters

    Get PDF
    RMN és una plataforma analítica utilitzada per quantificar els metabòlits presents en les mostres de metabolòmica. Els espectres de 1H-RMN mostren múltiples senyals de metabòlits amb tres paràmetres específics (desplaçament químic, ample mitjà de banda, intensitat) que poden mostrar reactivitat a les condicions de la mostra. Aquesta reactivitat perjudica l'optimització del fitat dels espectres necessari per a realitzar el perfilat automàtic de metabòlits de les mostres. L'objectiu d'aquesta tesi va ser l'exploració de l'ús de tècniques de tendència basades en Machine Learning (ML) amb l'ús de fluxos de treball robustos per modelar i explotar la informació present en els diferents paràmetres de senyal durant el perfilat de metabòlits dels conjunts de dades 1H-NMR. En particular, les aplicacions considerades van ser la millora de la classificació de les mostres en els estudis de metabolòmica i la millora de la qualitat del perfilat automàtic. A més d'assolir aquests objectius, també es van obtenir èxits addicionals (per exemple, la generació d'una nova eina de codi obert capaç de resoldre els reptes en l'elaboració de perfils de matrius complexes).RMN es una plataforma analítica utilizada para cuantificar los metabolitos presentes en las muestras de metabolómica. Los espectros de 1H-RMN muestran múltiples señales de metabolitos con tres parámetros específicos (desplazamiento químico, ancho medio de banda, intensidad) que pueden mostrar reactividad a las condiciones de la muestra. Esta reactividad perjudica a la optimización del fitado de los espectros necesario para realizar el perfilado automático de metabolitos de las muestras. El objetivo de esta tesis fue la exploración del uso de técnicas de tendencia basadas en Machine Learning (ML) con el uso de flujos de trabajo robustos para modelar y explotar la información presente en los diferentes parámetros de señal durante el perfilado de metabolitos de los conjuntos de datos 1H-NMR. En particular, las aplicaciones consideradas fueron la mejora de la clasificación de las muestras en los estudios de metabolómica y la mejora de la calidad del perfilado automático. Además de lograr estos objetivos, también se obtuvieron logros adicionales (por ejemplo, la generación de una nueva herramienta de código abierto capaz de resolver los retos en la elaboración de perfiles de matrices complejas).NMR is an analytical platform used to quantify the metabolites present in metabolomics samples. 1H-NMR spectra show multiple metabolite signals, each one with three parameters (chemical shift, half bandwidth, intensity) which can show reactivity to the sample conditions. This reactivity is a challenge for the optimization of the lineshape fitting of spectra necessary to perform the automatic metabolite profiling of samples. The aim of this PhD thesis was the exploration of the use of trending machine learning (ML)-based techniques and of robust ML-based workflows to model and then exploit the information present in the different parameters collected for each signal during the metabolite profiling of 1H-NMR datasets. In particular, the applications considered were the enhanced classification of samples in metabolomics studies and the enhancement of the quality of automatic profiling in 1H-NMR datasets. in addition to the achievement of these goals, additional achievements (e.g., the generation of a new open-source tool able to solve challenges in the profiling of complex matrices) was also fulfilled

    A Methodology Based on FT-IR Data Combined with Random Forest Model to Generate Spectralprints for the Characterization of High-Quality Vinegars

    Get PDF
    Sherry wine vinegar is a Spanish gourmet product under Protected Designation of Origin (PDO). Before a vinegar can be labeled as Sherry vinegar, the product must meet certain requirements as established by its PDO, which, in this case, means that it has been produced following the traditional solera and criadera ageing system. The quality of the vinegar is determined by many factors such as the raw material, the acetification process or the aging system. For this reason, mainly producers, but also consumers, would benefit from the employment of effective analytical tools that allow precisely determining the origin and quality of vinegar. In the present study, a total of 48 Sherry vinegar samples manufactured from three different starting wines (Palomino Fino, Moscatel, and Pedro Ximenez wine) were analyzed by Fourier-transform infrared (FT-IR) spectroscopy. The spectroscopic data were combined with unsupervised exploratory techniques such as hierarchical cluster analysis (HCA) and principal component analysis (PCA), as well as other nonparametric supervised techniques, namely, support vector machine (SVM) and random forest (RF), for the characterization of the samples. The HCA and PCA results present a clear grouping trend of the vinegar samples according to their raw materials. SVM in combination with leave-one-out cross-validation (LOOCV) successfully classified 100% of the samples, according to the type of wine used for their production. The RF method allowed selecting the most important variables to develop the characteristic fingerprint ("spectralprint") of the vinegar samples according to their starting wine. Furthermore, the RF model reached 100% accuracy for both LOOCV and out-of-bag (OOB) sets.The authors would like to thank the winery Bodegas Paez Morilla S.A. for providing the Sherry vinegar samples and for the interest shown in the results of this study and Programa de Fomento e Impulso de la Actividad de Investigacion y Transferencia de la Universidad de Cadiz for the financial support of this manuscript

    Classifying malignant brain tumours from 1H-MRS data using Breadth Ensemble Learning

    Get PDF
    In neuro oncology, the accurate diagnostic identification and characterization of tumours is paramount for determining their prognosis and the adequate course of treatment. This is usually a difficult problem per se, due to the localization of the tumour in an extremely sensitive and difficult to reach organ such as the brain. The clinical analysis of brain tumours often requires the use of non-invasive measurement methods, the most common of which resort to imaging techniques. The discrimination between high-grade malignant tumours of different origin but similar characteristics, such as glioblastomas and metastases, is a particularly difficult problem in this context. This is because imaging techniques are often not sensitive enough and their spectroscopic signal is overall too similar. In spite of this, machine learning techniques, coupled with robust feature selection procedures, have recently made substantial inroads into the problem. In this study, magnetic resonance spectroscopy data from an international, multicentre database were used to discriminate between these two types of malignant brain tumours using ensemble learning techniques, with a focus on the definition of a feature selection method specifically designed for ensembles. This method, Breadth Ensemble Learning, takes advantage of the fact that many of the frequencies of the available spectra convey no relevant information for the discrimination of the tumours. The potential of the proposed method is supported by some of the best results reported to date for this problem.Postprint (author's final draft

    Chemometrics Methods for Specificity, Authenticity and Traceability Analysis of Olive Oils: Principles, Classifications and Applications

    Get PDF
    International audienceBackground. Olive oils (OOs) show high chemical variability due to several factors of genetic, environmental and anthropic types. Genetic and environmental factors are responsible for natural compositions and polymorphic diversification resulting in different varietal patterns and phenotypes. Anthropic factors, however, are at the origin of different blends' preparation leading to normative, labelled or adulterated commercial products. Control of complex OO samples requires their (i) characterization by specific markers; (ii) authentication by fingerprint patterns; and (iii) monitoring by traceability analysis.Methods. These quality control and management aims require the use of several multivariate statistical tools: specificity highlighting requires ordination methods; authentication checking calls for classification and pattern recognition methods; traceability analysis implies the use of network-based approaches able to separate or extract mixed information and memorized signals from complex matrices. Results. This chapter presents a review of different chemometrics methods applied for the control of OO variability from metabolic and physical-chemical measured characteristics. The different chemometrics methods are illustrated by different study cases on monovarietal and blended OO originated from different countries.Conclusion. Chemometrics tools offer multiple ways for quantitative evaluations and qualitative control of complex chemical variability of OO in relation to several intrinsic and extrinsic factors
    corecore