1,345 research outputs found
Methodology and theory for partial least squares applied to functional data
The partial least squares procedure was originally developed to estimate the
slope parameter in multivariate parametric models. More recently it has gained
popularity in the functional data literature. There, the partial least squares
estimator of slope is either used to construct linear predictive models, or as
a tool to project the data onto a one-dimensional quantity that is employed for
further statistical analysis. Although the partial least squares approach is
often viewed as an attractive alternative to projections onto the principal
component basis, its properties are less well known than those of the latter,
mainly because of its iterative nature. We develop an explicit formulation of
partial least squares for functional data, which leads to insightful results
and motivates new theory, demonstrating consistency and establishing
convergence rates.Comment: Published in at http://dx.doi.org/10.1214/11-AOS958 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Improvements to PLSc: Remaining problems and simple solutions
The recent article by Dijkstra and Henseler (2015b) presents a consistent partial least squares (PLSc) estimator that corrects for measurement error attenuation and provides evidence showing that, generally, PLSc performs comparably to a wide variety of more conventional estimators for structural equation models (SEM) with latent variables. However, PLSc does not adjust for other limitations of conventional PLS, namely: (1) bias in estimates of regression coefficients due to capitalization on chance; and (2) overestimation of composite reliability due to the proportionality relation between factor loadings and indicator weights. In this article, we illustrate these problems and then propose a simple solution: the use of unit-weighted composites, rather than those constructed from PLS results, combined with errors-in-variables regression (EIV) by using reliabilities obtained from factor analysis. Our simulations show that these two improvements perform as well as or better than PLSc. We also provide examples of how our proposed estimator can be easily implemented in various proprietary and open source software packages
Data-Driven Fault Detection and Reasoning for Industrial Monitoring
This open access book assesses the potential of data-driven methods in industrial process monitoring engineering. The process modeling, fault detection, classification, isolation, and reasoning are studied in detail. These methods can be used to improve the safety and reliability of industrial processes. Fault diagnosis, including fault detection and reasoning, has attracted engineers and scientists from various fields such as control, machinery, mathematics, and automation engineering. Combining the diagnosis algorithms and application cases, this book establishes a basic framework for this topic and implements various statistical analysis methods for process monitoring. This book is intended for senior undergraduate and graduate students who are interested in fault diagnosis technology, researchers investigating automation and industrial security, professional practitioners and engineers working on engineering modeling and data processing applications. This is an open access book
(Q)SAR Modelling of Nanomaterial Toxicity - A Critical Review
There is an increasing recognition that nanomaterials pose a risk to human health, and that the novel engineered nanomaterials (ENMs) in the nanotechnology industry and their increasing industrial usage poses the most immediate problem for hazard assessment, as many of them remain untested. The large number of materials and their variants (different sizes and coatings for instance) that require testing and ethical pressure towards non-animal testing means that expensive animal bioassay is precluded, and the use of (quantitative) structure activity relationships ((Q)SAR) models as an alternative source of hazard information should be explored. (Q)SAR modelling can be applied to fill the critical knowledge gaps by making the best use of existing data, prioritize physicochemical parameters driving toxicity, and provide practical solutions to the risk assessment problems caused by the diversity of ENMs. This paper covers the core components required for successful application of (Q)SAR technologies to ENMs toxicity prediction, and summarizes the published nano-(Q)SAR studies and outlines the challenges ahead for nano-(Q)SAR modelling. It provides a critical review of (1) the present status of the availability of ENMs characterization/toxicity data, (2) the characterization of nanostructures that meets the need of (Q)SAR analysis, (3) the summary of published nano-(Q)SAR studies and their limitations, (4) the in silico tools for (Q)SAR screening of nanotoxicity and (5) the prospective directions for the development of nano-(Q)SAR models
Hyperspektral avbildning: algoritmiske fremskritt innen variabelt utvalg og anvendelser til trevitenskap
According to Beer’s Law there is a linear dependence between the absorbance of a material and the concentration of an absorbing species in the material. Thus, if one is interested in modeling the concentration of an absorbing species, it should be possible to do so by utilizing a linear model to describe the concentration of the species from a measurement of the absorbance of the material. This thesis is concerned with developing such models from hyperspectral measurements taken in the visible (vis) and near infrared (NIR) region of the electromagnetic spectrum. When developing such models, it is frequently the case that a majority of the wavelengths within a measured spectrum are not absorbed by the species of interest - and should therefore preferably be excluded from the developed model in order to optimize its performance. The process of identifying unnecessary wavelengths is often driven by trial and error, as such it tends to be time consuming and computationally demanding. During the work leading up to Paper I we discovered a conceptually very simple technique which allows calculations to be recycled when developing partial least squares (PLS) models from different combinations of wavelengths. The technique can greatly reduce the computational cost of ftting multiple regression models with various combinations of included/excluded wavelengths to a dataset. In Paper II we incorporate the fndings of Paper I into a genetic algorithm (GA) and demonstrate that the technique also can be used to simultaneously evaluate— in a computationally effcient manner—combinations of wavelengths which are preprocessed using different techniques. In Paper III and IV we develop models which solve wood science related issues. In Paper III samples of spruce (Picea abies) treated with a phosphorus-based fame retardant compound were scanned using a NIR hyperspectral camera. The resulting data was subsequently used to develop a PLS model which estimated the phosphorous content from the spectral signal. In Paper IV samples of thermally modified pine (Pinus sylvestris) were repeatedly scanned over time as they dried. The resulting time series sequences of hyperspectral NIR data was used to develop a regression model capable of estimating the moisture content of the pine from the spectra. In Paper V a generic method is developed for studying and summarizing hyperspectral time series sequences in terms of known and unknown variations. The main idea of the presented method is that spectral variations of known origin are removed from the data. The remaining residual data, containing variation of unknown origin, is then subjected to dimensionality reduction in order to identify new previously unknown variations in the data; variations which in the case of hyperspectral time series data may exhibit temporal as well as spatial patterns of interest. The developed concept was experimentally evaluated in Paper V on a piece of unmodified spruce (Picea abies) which was monitored using a vis-NIR hyperspectral camera as it dried over the course of 21 hours
MALDI-ToF mass spectrometry biomarker profiling via multivariate data analysis application in the biopharmaceutical bioprocessing industry
PhD ThesisMatrix-assisted laser desorption/ionisation time-of-flight mass spectrometry (MALDI-ToF MS) is a technique by which protein profiles can be rapidly produced from biological samples. Proteomic profiling and biomarker identification using MALDI-ToF MS have been utilised widely in microbiology for bacteria identification and in clinical proteomics for disease-related biomarker discovery. To date, the benefits of MALDI-ToF MS have not been realised in the area of mammalian cell culture during bioprocessing.
This thesis explores the approach of ‘intact-cell’ MALDI-ToF MS (ICM-MS) combined with projection to latent structures – discriminant analysis (PLS-DA), to discriminate between mammalian cell lines during bioprocessing. Specifically, the industrial collaborator, Lonza Biologics is interested in adopting this approach to discriminate between IgG monoclonal antibody producing Chinese hamster ovaries (CHO) cell lines based on their productivities and identify protein biomarkers which are associated with the cell line productivities. After classifying cell lines into two categories (high/low producers; Hs/Ls), it is hypothesised that Hs and Ls CHO cells exhibit different metabolic profiles and hence differences in phenotypic expression patterns will be observed. The protein expression patterns correlate to the productivities of the cell lines, and introduce between-class variability. The chemometric method of PLS-DA can use this variability to classify the cell lines as Hs or Ls.
A number of differentially expressed proteins were matched and identified as biomarkers after a SwissProt/TrEMBL protein database search. The identified proteins revealed that proteins involved in biological processes such as protein biosynthesis, protein folding, glycolysis and cytoskeleton architecture were upregulated in Hs. This study demonstrates that ICM-MS combined with PLS-DA and a protein database search can be a rapid and valuable tool for biomarker discovery in the bioprocessing industry. It may help in providing clues to potential cell genetic engineering targets as well as a tool in process development in the bioprocessing industry. With the completion of the sequencing of the CHO genome, this study provides a foundation for rapid biomarker profiling of CHO cell lines in culture during recombinant protein manufacturing.Lonza Biologics
Scalable learning for geostatistics and speaker recognition
With improved data acquisition methods, the amount of data that is being collected has increased severalfold. One of the objectives in data collection is to learn useful underlying patterns. In order to work with data at this scale, the methods not only need to be effective with the underlying data, but also have to be scalable to handle larger data collections. This thesis focuses on developing scalable and effective methods targeted towards different domains, geostatistics and speaker recognition in particular.
Initially we focus on kernel based learning methods and develop a GPU based parallel framework for this class of problems. An improved numerical algorithm that utilizes the GPU parallelization to further enhance the computational performance of kernel regression is proposed. These methods are then demonstrated on problems arising in geostatistics and speaker recognition.
In geostatistics, data is often collected at scattered locations and factors like instrument malfunctioning lead to missing observations. Applications often require the ability interpolate this scattered spatiotemporal data on to a regular grid continuously over time. This problem can be formulated as a regression problem, and one of the most popular geostatistical interpolation techniques, kriging is analogous to a standard kernel method: Gaussian process regression. Kriging is computationally expensive and needs major modifications and accelerations in order to be used practically. The GPU framework developed for kernel methods is extended to kriging and further the GPU's texture memory is better utilized for enhanced computational performance.
Speaker recognition deals with the task of verifying a person's identity based on samples of his/her speech - "utterances". This thesis focuses on text-independent framework and three new recognition frameworks were developed for this problem. We proposed a kernelized Renyi distance based similarity scoring for speaker recognition. While its performance is promising, it does not generalize well for limited training data and therefore does not compare well to state-of-the-art recognition systems. These systems compensate for the variability in the speech data due to the message, channel variability, noise and reverberation. State-of-the-art systems model each speaker as a mixture of Gaussians (GMM) and compensate for the variability (termed "nuisance"). We propose a novel discriminative framework using a latent variable technique, partial least squares (PLS), for improved recognition. The kernelized version of this algorithm is used to achieve a state of the art speaker ID system, that shows results competitive with the best systems reported on in NIST's 2010 Speaker Recognition Evaluation
Monitoring wine fermentation using ATR-MIR spectroscopy and chemometric techniques.
El vi és un dels productes amb valor afegit més apreciats al món i és per això que el control de la producció vinÃcola
ha sigut sempre un tema prioritari per a la majoria dels cellers. La implementació d’anà lisis at-line com són les
Tècniques AnalÃtiques de Processos (PAT), no només permet un control del vi acabat si no que també dóna la
possibilitat de prendre mesures correctives al llarg del procés evitant aixà obtenir un producte final defectuós. En
aquesta tesi doctoral, es va investigar la possibilitat d’implementar diferents estratègies per controlar i detectar
desviacions durant la fermentació alcohòlica utilitzant un equip portable i de resposta rà pida: un equip d’
espectroscòpia en l’infraroig mitjà , en el mode de reflectà ncia total atenuada (ATR-MIR) el qual permet obtenir, en
pocs segons, una gran quantitat d’informació sobre el procés de fermentació que es va tractar amb diferents
tècniques quimiomètriques.
Primer, utilitzant les dades espectrals i la regressió de mÃnims quadrats parcials, es van predir diferents parà metres
quÃmics durant la fermentació alcohòlica. En segon lloc, es van comparar els espectres de fermentacions control amb
fermentacions desviades utilitzant l’anà lisi discriminant per mÃnims quadrats parcialsEl vino es uno de los productos con valor añadido más apreciados del mundo y por ello, el control de la producción
vinÃcola ha sido siempre un tema prioritario para la mayorÃa de bodegas. La implementación de análisis at-line como
son las Técnicas AnalÃticas de Procesos (PAT), no sólo permite un control del vino acabado si no que también brinda
la posibilidad de tomar medidas correctivas a lo largo del proceso evitando asà obtener un producto final defectuoso.
En esta tesis doctoral, se investigó la posibilidad de implementar diferentes estrategias para controlar y detectar
desviaciones durante la fermentación alcohólica utilizando un equipo portátil y de respuesta rápida: un equipo de
espectroscopia en el infrarrojo medio, en el modo de reflectancia total atenuada (ATR-MIR) el cual permite obtener,
en pocos segundos, una gran cantidad de información sobre el proceso de fermentación que se trató con diferentes
técnicas quimiométricas.
Primero, usando los datos espectrales y la regresión de mÃnimos cuadrados parciales, se predijeron distintos
parámetros quÃmicos durante la fermentación alcohólica. En segundo lugar, se compararon los espectros de
fermentaciones control con fermentaciones desviadas utilizando el análisis discriminante por mÃnimos cuadrados
parcialesWine is one of the most appreciated high added-value products in the world and therefore, controlling wine production
has always been a priority for most wineries. Implementing at-line analyses such as Process Analytical Technologies
(PAT) guidelines, not only enables a control of the final wine but also gives the possibility to apply correcting measures
throughout the process, thus avoiding a defective final product. In this doctoral thesis, we investigated the possibility of
implementing different strategies to control and detect deviations during wine alcoholic fermentation using a fast and
portable equipment: an Attenuated Total Reflectance Mid-Infrared (ATR-MIR) spectrometer which allows obtaining, in
a few seconds, a large amount of information about the fermentation process, which was processed with different
chemometric techniques.
First, using the spectral data and Partial Least Square Regression, different chemical parameters were predicted
during alcoholic fermentation. Secondly, we compared the spectra from both Normal Operation Conditions and
deviated fermentations using Partial Least Squares Discriminant Analysis. ANOVA–simultaneous component analysis
was applied to study the influence of several factors into the variance of the spectra. Multivariate Curve Resolution
Alternating Least Squares was used to model both alcoholic and malolactic fermentations. Finally, a PAT methodolog
Data-Driven Fault Detection and Reasoning for Industrial Monitoring
This open access book assesses the potential of data-driven methods in industrial process monitoring engineering. The process modeling, fault detection, classification, isolation, and reasoning are studied in detail. These methods can be used to improve the safety and reliability of industrial processes. Fault diagnosis, including fault detection and reasoning, has attracted engineers and scientists from various fields such as control, machinery, mathematics, and automation engineering. Combining the diagnosis algorithms and application cases, this book establishes a basic framework for this topic and implements various statistical analysis methods for process monitoring. This book is intended for senior undergraduate and graduate students who are interested in fault diagnosis technology, researchers investigating automation and industrial security, professional practitioners and engineers working on engineering modeling and data processing applications. This is an open access book
- …