696 research outputs found

    Aggregated functional data model for Near-Infrared Spectroscopy calibration and prediction

    Full text link
    Calibration and prediction for NIR spectroscopy data are performed based on a functional interpretation of the Beer-Lambert formula. Considering that, for each chemical sample, the resulting spectrum is a continuous curve obtained as the summation of overlapped absorption spectra from each analyte plus a Gaussian error, we assume that each individual spectrum can be expanded as a linear combination of B-splines basis. Calibration is then performed using two procedures for estimating the individual analytes curves: basis smoothing and smoothing splines. Prediction is done by minimizing the square error of prediction. To assess the variance of the predicted values, we use a leave-one-out jackknife technique. Departures from the standard error models are discussed through a simulation study, in particular, how correlated errors impact on the calibration step and consequently on the analytes' concentration prediction. Finally, the performance of our methodology is demonstrated through the analysis of two publicly available datasets.Comment: 27 pages, 7 figures, 7 table

    Application of High-Throughput Screening Raman Spectroscopy (HTS-RS) for Label-Free Identification and Molecular Characterization of Pollen

    Get PDF
    Pollen studies play a critical role in various fields of science. In the last couple of decades, replacement of manual identification of pollen by image-based methods using pollen morphological features was a great leap forward, but challenges for pollen with similar morphology remain, and additional approaches are required. Spectroscopy approaches for identification of pollen, such as Raman spectroscopy has potential benefits over traditional methods, due to the investigation of the intrinsic molecular composition of a sample. However, current Raman-based characterization of pollen is complex and time-consuming, resulting in low throughput and limiting the statistical significance of the acquired data. Previously demonstrated high-throughput screening Raman spectroscopy (HTS-RS) eliminates the complexity as well as human interaction by incorporation full automation of the data acquisition process. Here, we present a customization of HTS-RS for pollen identification, enabling sampling of a large number of pollen in comparison to other state-of-the-art Raman pollen investigations. We show that using Raman spectra we are able to provide a preliminary estimation of pollen types based on growth habits using hierarchical cluster analysis (HCA) as well as good taxonomy of 37 different Pollen using principal component analysis-support vector machine (PCA-SVM) with good accuracy even for the pollen specimens sharing similar morphological features. Our results suggest that HTS-RS platform meets the demands for automated pollen detection making it an alternative method for research concerning pollen

    Nonlinear multiple regression methods for spectroscopic analysis: application to NIR calibration

    Get PDF
    Chemometrics has been applied to analyse near-infrared (NIR) spectra for decades. Linear regression methods such as partial least squares (PLS) regression and principal component regression (PCR) are simple and widely used solutions for spectroscopic calibration. My dissertation connects spectroscopic calibration with nonlinear machine learning techniques. It explores the feasibility of applying nonlinear methods for NIR calibration. Investigated nonlinear regression methods include least squares support vec- tor machine (LS-SVM), Gaussian process regression (GPR), Bayesian hierarchical mixture of linear regressions (HMLR) and convolutional neural networks (CNN). Our study focuses on the discussion of various design choices, interpretation of nonlinear models and providing novel recommendations and insights for the con- struction nonlinear regression models for NIR data. Performances of investigated nonlinear methods were benchmarked against traditional methods on multiple real-world NIR datasets. The datasets have differ- ent sizes (varying from 400 samples to 7000 samples) and are from various sources. Hypothesis tests on separate, independent test sets indicated that nonlinear methods give significant improvements in most practical NIR calibrations

    Multivariate analysis and artificial neural network approaches of near infrared spectroscopic data for non-destructive quality attributes prediction of Mango (Mangifera indica L.)

    Get PDF
    There is a need for fast and reliable quality and authenticity control tools of pharmaceutical ingredients. Among others, hormone containing drugs and foods are subject to scrutiny. In this study, terahertz (THz) spectroscopy and THz imaging are applied for the first time to analyze melatonin and its pharmaceutical product Circadin. Melatonin is a hormone found naturally in the human body, which is responsible for the regulation of sleep-wake cycles. In the THz frequency region between 1.5 THz and 4.5 THz, characteristic melatonin spectral features at 3.21 THz, and a weaker one at 4.20 THz, are observed allowing for a quantitative analysis within the final products. Spectroscopic THz imaging of different concentrations of Circadin and melatonin as an active pharmaceutical ingredient in prepared pellets is also performed, which permits spatial recognition of these different substances. These results indicate that THz spectroscopy and imaging can be an indispensable tool, complementing Raman and Fourier transform infrared spectroscopies, in order to provide quality control of dietary supplements and other pharmaceutical products

    Ny forståelse av gasshydratfenomener og naturlige inhibitorer i råoljesystemer gjennom massespektrometri og maskinlæring

    Get PDF
    Gas hydrates represent one of the main flow assurance issues in the oil and gas industry as they can cause complete blockage of pipelines and process equipment, forcing shut downs. Previous studies have shown that some crude oils form hydrates that do not agglomerate or deposit, but remain as transportable dispersions. This is commonly believed to be due to naturally occurring components present in the crude oil, however, despite decades of research, their exact structures have not yet been determined. Some studies have suggested that these components are present in the acid fractions of the oils or are related to the asphaltene content of the oils. Crude oils are among the worlds most complex organic mixtures and can contain up to 100 000 different constituents, making them difficult to characterise using traditional mass spectrometers. The high mass accuracy of Fourier Transform Ion Cyclotron Resonance Mass Spectrometry (FT-ICR MS) yields a resolution greater than traditional techniques, making FT-ICR MS able to characterise crude oils to a greater extent, and possibly identify hydrate active components. FT-ICR MS spectra usually contain tens of thousands of peaks, and data treatment methods able to find underlying relationships in big data sets are required. Machine learning and multivariate statistics include many methods suitable for big data. A literature review identified a number of promising methods, and the current status for the use of machine learning for analysis of gas hydrates and FT-ICR MS data was analysed. The literature study revealed that although many studies have used machine learning to predict thermodynamic properties of gas hydrates, very little work have been done in analysing gas hydrate related samples measured by FT-ICR MS. In order to aid their identification, a successive accumulation procedure for increasing the concentrations of hydrate active components was developed by SINTEF. Comparison of the mass spectra from spiked and unspiked samples revealed some peaks that increased in intensity over the spiking levels. Several classification methods were used in combination with variable selection, and peaks related to hydrate formation were identified. The corresponding molecular formulas were determined, and the peaks were assumed to be related to asphaltenes, naphthenes and polyethylene glycol. To aid the characterisation of the oils, infrared spectroscopy (both Fourier Transform infrared and near infrared) was combined with FT-ICR MS in a multiblock analysis to predict the density of crude oils. Two different strategies for data fusion were attempted, and sequential fusion of the blocks achieved the highest prediction accuracy both before and after reducing the dimensions of the data sets by variable selection. As crude oils have such complex matrixes, samples are often very different, and many methods are not able to handle high degrees of variations or non-linearities between the samples. Hierarchical cluster-based partial least squares regression (HC-PLSR) clusters the data and builds local models within each cluster. HC-PLSR can thus handle non-linearities between clusters, but as PLSR is a linear model the data is still required to be locally linear. HC-PLSR was therefore expanded into deep learning (HC-CNN and HC-RNN) and SVR (HC-SVR). The deep learning-based models outperformed HC-PLSR for a data set predicting average molecular weights from hydrolysed raw materials. The analysis of the FT-ICR MS spectra revealed that the large amounts of information contained in the data (due to the high resolution) can disturb the predictive models, but the use of variable selection counteracts this effect. Several methods from machine learning and multivariate statistics were proven valuable for prediction of various parameters from FT-ICR MS using both classification and regression methods.Gasshydrater er et av hovedproblemene for Flow assurance i olje- og gassnæringen ettersom at de kan forårsake blokkeringer i oljerørledninger og prosessutstyr som krever at systemet må stenges ned. Tidligere studier har vist at noen råoljer danner hydrater som ikke agglomererer eller avsetter, men som forblir som transporterbare dispersjoner. Dette antas å være på grunn av naturlig forekommende komponenter til stede i råoljen, men til tross for årevis med forskning er deres nøyaktige strukturer enda ikke bestemt i detalj. Noen studier har indikert at disse komponentene kan stamme fra syrefraksjonene i oljen eller være relatert til asfalteninnholdet i oljene. Råoljer er blant verdens mest komplekse organiske blandinger og kan inneholde opptil 100 000 forskjellige bestanddeler, som gjør dem vanskelig å karakterisere ved bruk av tradisjonelle massespektrometre. Den høye masseoppløsningen Fourier-transform ion syklotron resonans massespektrometri (FT-ICR MS) gir en høyere oppløsning enn tradisjonelle teknikker, som gjør FT-ICR MS i stand til å karakterisere råoljer i større grad og muligens identifisere hydrataktive komponenter. FT-ICR MS spektre inneholder vanligvis titusenvis av topper, og det er nødvendig å bruke databehandlingsmetoder i stand til å håndtere store datasett, med muligheter til å finne underliggende forhold for å analysere spektrene. Maskinlæring og multivariat statistikk har mange metoder som er passende for store datasett. En litteratur studie identifiserte flere metoder og den nåværende statusen for bruken av maskinlæring for analyse av gasshydrater og FT-ICR MS data. Litteraturstudien viste at selv om mange studier har brukt maskinlæring til å predikere termodynamiske egenskaper for gasshydrater, har lite arbeid blitt gjort med å analysere gasshydrat relaterte prøver målt med FT-ICR MS. For å bistå identifikasjonen ble en suksessiv akkumuleringsprosedyre for å øke konsentrasjonene av hydrataktive komponenter utviklet av SINTEF. Sammenligninger av massespektrene fra spikede og uspikede prøver viste at noen topper økte sammen med spikingnivåene. Flere klassifikasjonsmetoder ble brukt i kombinasjon med ariabelseleksjon for å identifisere topper relatert til hydratformasjon. Molekylformler ble bestemt og toppene ble antatt å være relatert til asfaltener, naftener og polyetylenglykol. For å bistå karakteriseringen av oljene ble infrarød spektroskopi inkludert med FT-ICR MS i en multiblokk analyse for å predikere tettheten til råoljene. To forskjellige strategier for datafusjonering ble testet og sekvensiell fusjonering av blokkene oppnådde den høyeste prediksjonsnøyaktigheten både før og etter reduksjon av datasettene med bruk av variabelseleksjon. Ettersom råoljer har så kompleks sammensetning, er prøvene ofte veldig forskjellige og mange metoder er ikke egnet for å håndtere store variasjoner eller ikke-lineariteter mellom prøvene. Hierarchical cluster-based partial least squares regression (HCPLSR) grupperer dataene og lager lokale modeller for hver gruppe. HC-PLSR kan dermed håndtere ikke-lineariteter mellom gruppene, men siden PLSR er en lokal modell må dataene fortsatt være lokalt lineære. HC-PLSR ble derfor utvidet til convolutional neural networks (HC-CNN) og recurrent neural networks (HC-RNN) og support vector regression (HC-SVR). Disse dyp læring metodene utkonkurrerte HC-PLSR for et datasett som predikerte gjennomsnittlig molekylvekt fra hydrolyserte råmaterialer. Analysen av FT-ICR MS spektre viste at spektrene inneholder veldig mye informasjon. Disse store mengdene med data kan forstyrre prediksjonsmodeller, men bruken av variabelseleksjon motvirket denne effekten. Flere metoder fra maskinlæring og multivariat statistikk har blitt vist å være nyttige for prediksjon av flere parametere from FT-ICR MS data ved bruk av både klassifisering og regresjon
    corecore