Search CORE

1,345 research outputs found

Methodology and theory for partial least squares applied to functional data

Author: Delaigle Aurore
Hall Peter
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2012
Field of study

The partial least squares procedure was originally developed to estimate the slope parameter in multivariate parametric models. More recently it has gained popularity in the functional data literature. There, the partial least squares estimator of slope is either used to construct linear predictive models, or as a tool to project the data onto a one-dimensional quantity that is employed for further statistical analysis. Although the partial least squares approach is often viewed as an attractive alternative to projections onto the principal component basis, its properties are less well known than those of the latter, mainly because of its iterative nature. We develop an explicit formulation of partial least squares for functional data, which leads to insightful results and motivates new theory, demonstrating consistency and establishing convergence rates.Comment: Published in at http://dx.doi.org/10.1214/11-AOS958 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Improvements to PLSc: Remaining problems and simple solutions

Author: Aguirre-Urreta Miguel I.
McIntosh Cameron N.
Rönkkö Mikko
Publication venue: Aalto-yliopisto
Publication date: 01/01/2016
Field of study

The recent article by Dijkstra and Henseler (2015b) presents a consistent partial least squares (PLSc) estimator that corrects for measurement error attenuation and provides evidence showing that, generally, PLSc performs comparably to a wide variety of more conventional estimators for structural equation models (SEM) with latent variables. However, PLSc does not adjust for other limitations of conventional PLS, namely: (1) bias in estimates of regression coefficients due to capitalization on chance; and (2) overestimation of composite reliability due to the proportionality relation between factor loadings and indicator weights. In this article, we illustrate these problems and then propose a simple solution: the use of unit-weighted composites, rather than those constructed from PLS results, combined with errors-in-variables regression (EIV) by using reliabilities obtained from factor analysis. Our simulations show that these two improvements perform as well as or better than PLSc. We also provide examples of how our proposed estimator can be easily implemented in various proprietary and open source software packages

Aaltodoc Publication Archive

Data-Driven Fault Detection and Reasoning for Industrial Monitoring

Author: Chen Xiaolu
Wang Jing
Zhou Jinglin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/01/2022
Field of study

This open access book assesses the potential of data-driven methods in industrial process monitoring engineering. The process modeling, fault detection, classification, isolation, and reasoning are studied in detail. These methods can be used to improve the safety and reliability of industrial processes. Fault diagnosis, including fault detection and reasoning, has attracted engineers and scientists from various fields such as control, machinery, mathematics, and automation engineering. Combining the diagnosis algorithms and application cases, this book establishes a basic framework for this topic and implements various statistical analysis methods for process monitoring. This book is intended for senior undergraduate and graduate students who are interested in fault diagnosis technology, researchers investigating automation and industrial security, professional practitioners and engineers working on engineering modeling and data processing applications. This is an open access book

Directory of Open Access Books (DOAB)

(Q)SAR Modelling of Nanomaterial Toxicity - A Critical Review

Author: Akbari
Albanese
Albazzaz
Andres
Apté
Arena
Arora
Asare
Baer
Baskin
Baskin
Bengio
Benigni
Bhattacharjee
Bootz
Borchert
Boverhof
Boyd
Brown
Buontempo
Burden
Burello
Burello
Buzea
Caballero Díaz
Cai Y. Ma
Ceyda Oksel
Chau
Cohen
Cramer
Czermiński
Darnag
Dhawan
Domingos
Dunn
Díaz-Uriarte
Edelstein
Epa
Eriksson
Eriksson
Falkner
Foldbjerg
Fourches
Fourches
Gajewicz
Gallegos
Genuer
Glawdel
Glotzer
Goodarzi
Gramatica
Grassian
Gratton
Greene
Gu
Guha
Gurr
Guyon
Gwaze
Habibi-Yangjeh
Han
Han
Hansch
Harris
Hasegawa
Hassellöv
Holgate
Hoo
Horie
Hosokawa
Hussain
Inselberg
Jalali-Heravi
Jalali-Heravi
Jaworska
Jeng
Jiang
Jiang
Jing J. Liu
Kar
Karlsson
Kim
Kubinyi
Kübart
Laidlaw
Le
Lee
Li
Li
Lin
Liu
Liu
Liu
Liu
Lubinski
Luco
Luco
Ma
Ma
Ma
Magrez
Mei
Mohammadpour
Monteiro-Riviere
Mu
Napierska
Nel
Nguyen
Niu
Niu
Oberdorster
OECD
OECD
OECD
Overton
Park
Park
Paul
Pettitt
Poland
Powers
Powers
Powers
Puzyn
Puzyn
Puzyn
Puzyn
Qin
Rallo
Reddy
Richet
Rosipal
Sadik
Sahigara
Saquib
Savolainen
Sayes
Schaeublin
Setyawati
Shahlaei
Sharifi
Sharma
Sharma
Shaw
Shukla
Shvedova
Silva
Singh
Sivaraman
Supaka
Sussillo
Sussman
Tantra
Teixeira
Terry Wilkins
Thiele
Thomas
Tomaszewska
Toropov
Toropov
Toropov
Tougaard
Tropsha
Trouiller
Veerasamy
Ventura
Von der Kammer
Wang
Wang
Wang
Wani
Winkler
Xia
Xia
Xiu
Xu
Xue Z. Wang
Yang
Yang
Yao
Yao
Yee
Zhang
Zhao
Zhou
Zurada
Publication venue: 'Elsevier BV'
Publication date: 01/08/2015
Field of study

There is an increasing recognition that nanomaterials pose a risk to human health, and that the novel engineered nanomaterials (ENMs) in the nanotechnology industry and their increasing industrial usage poses the most immediate problem for hazard assessment, as many of them remain untested. The large number of materials and their variants (different sizes and coatings for instance) that require testing and ethical pressure towards non-animal testing means that expensive animal bioassay is precluded, and the use of (quantitative) structure activity relationships ((Q)SAR) models as an alternative source of hazard information should be explored. (Q)SAR modelling can be applied to fill the critical knowledge gaps by making the best use of existing data, prioritize physicochemical parameters driving toxicity, and provide practical solutions to the risk assessment problems caused by the diversity of ENMs. This paper covers the core components required for successful application of (Q)SAR technologies to ENMs toxicity prediction, and summarizes the published nano-(Q)SAR studies and outlines the challenges ahead for nano-(Q)SAR modelling. It provides a critical review of (1) the present status of the availability of ENMs characterization/toxicity data, (2) the characterization of nanostructures that meets the need of (Q)SAR analysis, (3) the summary of published nano-(Q)SAR studies and their limitations, (4) the in silico tools for (Q)SAR screening of nanotoxicity and (5) the prospective directions for the development of nano-(Q)SAR models

Crossref

White Rose Research Online

Hyperspektral avbildning: algoritmiske fremskritt innen variabelt utvalg og anvendelser til trevitenskap

Author: Stefansson Petter
Publication venue: Norwegian University of Life Sciences, Ås
Publication date: 01/01/2019
Field of study

According to Beer’s Law there is a linear dependence between the absorbance of a material and the concentration of an absorbing species in the material. Thus, if one is interested in modeling the concentration of an absorbing species, it should be possible to do so by utilizing a linear model to describe the concentration of the species from a measurement of the absorbance of the material. This thesis is concerned with developing such models from hyperspectral measurements taken in the visible (vis) and near infrared (NIR) region of the electromagnetic spectrum. When developing such models, it is frequently the case that a majority of the wavelengths within a measured spectrum are not absorbed by the species of interest - and should therefore preferably be excluded from the developed model in order to optimize its performance. The process of identifying unnecessary wavelengths is often driven by trial and error, as such it tends to be time consuming and computationally demanding. During the work leading up to Paper I we discovered a conceptually very simple technique which allows calculations to be recycled when developing partial least squares (PLS) models from different combinations of wavelengths. The technique can greatly reduce the computational cost of ftting multiple regression models with various combinations of included/excluded wavelengths to a dataset. In Paper II we incorporate the fndings of Paper I into a genetic algorithm (GA) and demonstrate that the technique also can be used to simultaneously evaluate— in a computationally effcient manner—combinations of wavelengths which are preprocessed using different techniques. In Paper III and IV we develop models which solve wood science related issues. In Paper III samples of spruce (Picea abies) treated with a phosphorus-based fame retardant compound were scanned using a NIR hyperspectral camera. The resulting data was subsequently used to develop a PLS model which estimated the phosphorous content from the spectral signal. In Paper IV samples of thermally modified pine (Pinus sylvestris) were repeatedly scanned over time as they dried. The resulting time series sequences of hyperspectral NIR data was used to develop a regression model capable of estimating the moisture content of the pine from the spectra. In Paper V a generic method is developed for studying and summarizing hyperspectral time series sequences in terms of known and unknown variations. The main idea of the presented method is that spectral variations of known origin are removed from the data. The remaining residual data, containing variation of unknown origin, is then subjected to dimensionality reduction in order to identify new previously unknown variations in the data; variations which in the case of hyperspectral time series data may exhibit temporal as well as spatial patterns of interest. The developed concept was experimentally evaluated in Paper V on a piece of unmodified spruce (Picea abies) which was monitored using a vis-NIR hyperspectral camera as it dried over the course of 21 hours

Brage NMBU

MALDI-ToF mass spectrometry biomarker profiling via multivariate data analysis application in the biopharmaceutical bioprocessing industry

Author: Momo Remi Ako-Mbianyor
Publication venue: Newcastle University
Publication date: 01/01/2013
Field of study

PhD ThesisMatrix-assisted laser desorption/ionisation time-of-flight mass spectrometry (MALDI-ToF MS) is a technique by which protein profiles can be rapidly produced from biological samples. Proteomic profiling and biomarker identification using MALDI-ToF MS have been utilised widely in microbiology for bacteria identification and in clinical proteomics for disease-related biomarker discovery. To date, the benefits of MALDI-ToF MS have not been realised in the area of mammalian cell culture during bioprocessing. This thesis explores the approach of ‘intact-cell’ MALDI-ToF MS (ICM-MS) combined with projection to latent structures – discriminant analysis (PLS-DA), to discriminate between mammalian cell lines during bioprocessing. Specifically, the industrial collaborator, Lonza Biologics is interested in adopting this approach to discriminate between IgG monoclonal antibody producing Chinese hamster ovaries (CHO) cell lines based on their productivities and identify protein biomarkers which are associated with the cell line productivities. After classifying cell lines into two categories (high/low producers; Hs/Ls), it is hypothesised that Hs and Ls CHO cells exhibit different metabolic profiles and hence differences in phenotypic expression patterns will be observed. The protein expression patterns correlate to the productivities of the cell lines, and introduce between-class variability. The chemometric method of PLS-DA can use this variability to classify the cell lines as Hs or Ls. A number of differentially expressed proteins were matched and identified as biomarkers after a SwissProt/TrEMBL protein database search. The identified proteins revealed that proteins involved in biological processes such as protein biosynthesis, protein folding, glycolysis and cytoskeleton architecture were upregulated in Hs. This study demonstrates that ICM-MS combined with PLS-DA and a protein database search can be a rapid and valuable tool for biomarker discovery in the bioprocessing industry. It may help in providing clues to potential cell genetic engineering targets as well as a tool in process development in the bioprocessing industry. With the completion of the sequencing of the CHO genome, this study provides a foundation for rapid biomarker profiling of CHO cell lines in culture during recombinant protein manufacturing.Lonza Biologics

Newcastle University eTheses

Scalable learning for geostatistics and speaker recognition

Author: Srinivasan Balaji Vasan
Publication venue
Publication date: 01/01/2011
Field of study

With improved data acquisition methods, the amount of data that is being collected has increased severalfold. One of the objectives in data collection is to learn useful underlying patterns. In order to work with data at this scale, the methods not only need to be effective with the underlying data, but also have to be scalable to handle larger data collections. This thesis focuses on developing scalable and effective methods targeted towards different domains, geostatistics and speaker recognition in particular. Initially we focus on kernel based learning methods and develop a GPU based parallel framework for this class of problems. An improved numerical algorithm that utilizes the GPU parallelization to further enhance the computational performance of kernel regression is proposed. These methods are then demonstrated on problems arising in geostatistics and speaker recognition. In geostatistics, data is often collected at scattered locations and factors like instrument malfunctioning lead to missing observations. Applications often require the ability interpolate this scattered spatiotemporal data on to a regular grid continuously over time. This problem can be formulated as a regression problem, and one of the most popular geostatistical interpolation techniques, kriging is analogous to a standard kernel method: Gaussian process regression. Kriging is computationally expensive and needs major modifications and accelerations in order to be used practically. The GPU framework developed for kernel methods is extended to kriging and further the GPU's texture memory is better utilized for enhanced computational performance. Speaker recognition deals with the task of verifying a person's identity based on samples of his/her speech - "utterances". This thesis focuses on text-independent framework and three new recognition frameworks were developed for this problem. We proposed a kernelized Renyi distance based similarity scoring for speaker recognition. While its performance is promising, it does not generalize well for limited training data and therefore does not compare well to state-of-the-art recognition systems. These systems compensate for the variability in the speech data due to the message, channel variability, noise and reverberation. State-of-the-art systems model each speaker as a mixture of Gaussians (GMM) and compensate for the variability (termed "nuisance"). We propose a novel discriminative framework using a latent variable technique, partial least squares (PLS), for improved recognition. The kernelized version of this algorithm is used to achieve a state of the art speaker ID system, that shows results competitive with the best systems reported on in NIST's 2010 Speaker Recognition Evaluation

Digital Repository at the University of Maryland

Monitoring wine fermentation using ATR-MIR spectroscopy and chemometric techniques.

Author: Cavaglia Pietro Julieta
Publication venue: 'Universitat Rovira I Virgili'
Publication date: 28/07/2021
Field of study

El vi és un dels productes amb valor afegit més apreciats al món i és per això que el control de la producció vinícola ha sigut sempre un tema prioritari per a la majoria dels cellers. La implementació d’anàlisis at-line com són les Tècniques Analítiques de Processos (PAT), no només permet un control del vi acabat si no que també dóna la possibilitat de prendre mesures correctives al llarg del procés evitant així obtenir un producte final defectuós. En aquesta tesi doctoral, es va investigar la possibilitat d’implementar diferents estratègies per controlar i detectar desviacions durant la fermentació alcohòlica utilitzant un equip portable i de resposta ràpida: un equip d’ espectroscòpia en l’infraroig mitjà, en el mode de reflectància total atenuada (ATR-MIR) el qual permet obtenir, en pocs segons, una gran quantitat d’informació sobre el procés de fermentació que es va tractar amb diferents tècniques quimiomètriques. Primer, utilitzant les dades espectrals i la regressió de mínims quadrats parcials, es van predir diferents paràmetres químics durant la fermentació alcohòlica. En segon lloc, es van comparar els espectres de fermentacions control amb fermentacions desviades utilitzant l’anàlisi discriminant per mínims quadrats parcialsEl vino es uno de los productos con valor añadido más apreciados del mundo y por ello, el control de la producción vinícola ha sido siempre un tema prioritario para la mayoría de bodegas. La implementación de análisis at-line como son las Técnicas Analíticas de Procesos (PAT), no sólo permite un control del vino acabado si no que también brinda la posibilidad de tomar medidas correctivas a lo largo del proceso evitando así obtener un producto final defectuoso. En esta tesis doctoral, se investigó la posibilidad de implementar diferentes estrategias para controlar y detectar desviaciones durante la fermentación alcohólica utilizando un equipo portátil y de respuesta rápida: un equipo de espectroscopia en el infrarrojo medio, en el modo de reflectancia total atenuada (ATR-MIR) el cual permite obtener, en pocos segundos, una gran cantidad de información sobre el proceso de fermentación que se trató con diferentes técnicas quimiométricas. Primero, usando los datos espectrales y la regresión de mínimos cuadrados parciales, se predijeron distintos parámetros químicos durante la fermentación alcohólica. En segundo lugar, se compararon los espectros de fermentaciones control con fermentaciones desviadas utilizando el análisis discriminante por mínimos cuadrados parcialesWine is one of the most appreciated high added-value products in the world and therefore, controlling wine production has always been a priority for most wineries. Implementing at-line analyses such as Process Analytical Technologies (PAT) guidelines, not only enables a control of the final wine but also gives the possibility to apply correcting measures throughout the process, thus avoiding a defective final product. In this doctoral thesis, we investigated the possibility of implementing different strategies to control and detect deviations during wine alcoholic fermentation using a fast and portable equipment: an Attenuated Total Reflectance Mid-Infrared (ATR-MIR) spectrometer which allows obtaining, in a few seconds, a large amount of information about the fermentation process, which was processed with different chemometric techniques. First, using the spectral data and Partial Least Square Regression, different chemical parameters were predicted during alcoholic fermentation. Secondly, we compared the spectra from both Normal Operation Conditions and deviated fermentations using Partial Least Squares Discriminant Analysis. ANOVA–simultaneous component analysis was applied to study the influence of several factors into the variance of the spectra. Multivariate Curve Resolution Alternating Least Squares was used to model both alcoholic and malolactic fermentations. Finally, a PAT methodolog

Tesis Doctorals en Xarxa

Inverse Problems in Geosciences: Modelling the Rock Properties of an Oil Reservoir

Author: Lange Katrine
Publication venue: Technical University of Denmark
Publication date: 01/01/2013
Field of study

Online Research Database In Technology

Data-Driven Fault Detection and Reasoning for Industrial Monitoring

Author: Chen Xiaolu
Wang Jing
Zhou Jinglin
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

OAPEN Library