11 research outputs found

    Pseudo-sample based contribution plots: innovative tools for fault diagnosis in kernel-based batch process monitoring

    Full text link
    [EN] This article explores the potential of kernel-based methods for fault diagnosis in batch process monitoring by combining Kernel-Principal Component Analysis and three common techniques which permit analyzing batch data by means of bilinear models: variable-wise unfolding, batch-wise unfolding and landmark feature extraction. Gower's idea of pseudo-sample projection is exploited to develop novel tools, the pseudo-sample based contribution plots, for diagnostic purposes. The results show that, when the datasets under study are affected by severe non-linearities, the proposed approach performs better than classical ones.This research work was partially supported by the Spanish Ministry of Economy and Competitiveness under the project DPI2011-28112-C04-02 and Shell Global Solutions International B.V. (Amsterdam, The Netherlands) under the project PT13698.Vitale, R.; Noord, OED.; Ferrer Riquelme, AJ. (2015). Pseudo-sample based contribution plots: innovative tools for fault diagnosis in kernel-based batch process monitoring. Chemometrics and Intelligent Laboratory Systems. 149:40-52. https://doi.org/10.1016/j.chemolab.2015.09.013S405214

    A Kernel Partial Least Square Based Feature Selection Method

    Get PDF
    Maximum relevance and minimum redundancy (mRMR) has been well recognised as one of the best feature selection methods. This paper proposes a Kernel Partial Least Square (KPLS) based mRMR method, aiming for easy computation and improving classification accuracy for high-dimensional data. Experiments with this approach have been conducted on seven real-life datasets of varied dimensionality and number of instances, with performance measured on four different classifiers: Naive Bayes, Linear Discriminant Analysis, Random Forest and Support Vector Machine. Experimental results have exhibited the advantage of the proposed method over several competing feature selection techniques

    Commercial forest species discrimination and mapping using cost effective multispectral remote sensing in midlands region of KwaZulu-Natal province, South Africa.

    Get PDF
    Masters Degree. University of KwaZulu-Natal, Pietermaritzburg, 2018.Discriminating forest species is critical for generating accurate and reliable information necessary for sustainable management and monitoring of forests. Remote sensing has recently become a valuable source of information in commercial forest management. Specifically, high spatial resolution sensors have increasingly become popular in forests mapping and management. However, the utility of such sensors is costly and have limited spatial coverage, necessitating investigation of cost effective, timely and readily available new generation sensors characterized by larger swath width useful for regional mapping. Therefore, this study sought to discriminate and map commercial forest species (i.e. E. dunii, E.grandis, E.mix, A.mearnsii, P.taedea and P.tecunumanii, P.elliotte) using cost effective multispectral sensors. The first objective of this study was to evaluate the utility of freely available Landsat 8 Operational Land Imager (OLI) in mapping commercial forest species. Using Partial Least Square Discriminant Analysis algorithm, results showed that Landsat 8 OLI and pan-sharpened version of Landsat 8 OLI image achieved an overall classification accuracy of 79 and 77.8%, respectively, while WorldView-2 used as a benchmark image, obtained 86.5%. Despite low spatial of resolution 30 m, result show that Landsat 8 OLI was reliable in discriminating forest species with reasonable and acceptable accuracy. This freely available imagery provides cheaper and accessible alternative that covers larger swath-width, necessary for regional and local forests assessment and management. The second objective was to examine the effectiveness of Sentinel-1 and 2 for commercial forest species mapping. With the use of Linear Discriminant Analysis, results showed an overall accuracy of 84% when using Sentinel 2 raw image as a standalone data. However, when Sentinel 2 was fused with Sentinel’s 1 Synthetic Aperture Radar (SAR) data, the overall accuracy increased to 88% using Vertical transmit/Horizontal receive (VH) polarization and 87% with Vertical transmit/Vertical receive (VV) polarization datasets. The utility of SAR data demonstrates capability for complementing Sentinel-2 multispectral imagery in forest species mapping and management. Overall, newly generated and readily available sensors demonstrated capability to accurately provide reliable information critical for mapping and monitoring of commercial forest species at local and regional scales

    A kernel-based approach for fault diagnosis in batch processes

    Full text link
    This article explores the potential of kernel-based techniques for discriminating on-specification and off-specification batch runs, combining kernel-partial least squares discriminant analysis and three common approaches to analyze batch data by means of bilinear models: landmark features extraction, batchwise unfolding, and variablewise unfolding. Gower s idea of pseudo-sample projection is exploited to recover the contribution of the initial variables to the final model and visualize those having the highest discriminant power. The results show that the proposed approach provides an efficient fault discrimination and enables a correct identification of the discriminant variables in the considered case studies.Vitale, R.; De Noord, OE.; Ferrer, A. (2014). A kernel-based approach for fault diagnosis in batch processes. Journal of Chemometrics. 28(8):697-707. doi:10.1002/cem.2629S697707288Cao, D.-S., Liang, Y.-Z., Xu, Q.-S., Hu, Q.-N., Zhang, L.-X., & Fu, G.-H. (2011). Exploring nonlinear relationships in chemical data using kernel-based methods. Chemometrics and Intelligent Laboratory Systems, 107(1), 106-115. doi:10.1016/j.chemolab.2011.02.004Walczak, B., & Massart, D. L. (1996). The Radial Basis Functions — Partial Least Squares approach as a flexible non-linear regression technique. Analytica Chimica Acta, 331(3), 177-185. doi:10.1016/0003-2670(96)00202-4Walczak, B., & Massart, D. L. (1996). Application of Radial Basis Functions — Partial Least Squares to non-linear pattern recognition problems: diagnosis of process faults. Analytica Chimica Acta, 331(3), 187-193. doi:10.1016/0003-2670(96)00206-1Gasteiger, J., & Zupan, J. (1993). Neural Networks in Chemistry. Angewandte Chemie International Edition in English, 32(4), 503-527. doi:10.1002/anie.199305031Li, H., Liang, Y., & Xu, Q. (2009). Support vector machines and its applications in chemistry. Chemometrics and Intelligent Laboratory Systems, 95(2), 188-198. doi:10.1016/j.chemolab.2008.10.007Williams, P. (2009). Influence of Water on Prediction of Composition and Quality Factors: The Aquaphotomics of Low Moisture Agricultural Materials. Journal of Near Infrared Spectroscopy, 17(6), 315-328. doi:10.1255/jnirs.862Tan, C., & Li, M. (2008). Mutual information-induced interval selection combined with kernel partial least squares for near-infrared spectral calibration. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 71(4), 1266-1273. doi:10.1016/j.saa.2008.03.033Embrechts, M. J., & Ekins, S. (2006). Classification of Metabolites with Kernel-Partial Least Squares (K-PLS). Drug Metabolism and Disposition, 35(3), 325-327. doi:10.1124/dmd.106.013185Arenas-Garcia, J., & Camps-Valls, G. (2008). Efficient Kernel Orthonormalized PLS for Remote Sensing Applications. IEEE Transactions on Geoscience and Remote Sensing, 46(10), 2872-2881. doi:10.1109/tgrs.2008.918765Sun, R., & Tsung, F. (2003). A kernel-distance-based multivariate control chart using support vector methods. International Journal of Production Research, 41(13), 2975-2989. doi:10.1080/1352816031000075224Lee, J.-M., Yoo, C., Choi, S. W., Vanrolleghem, P. A., & Lee, I.-B. (2004). Nonlinear process monitoring using kernel principal component analysis. Chemical Engineering Science, 59(1), 223-234. doi:10.1016/j.ces.2003.09.012Kewley, R. H., Embrechts, M. J., & Breneman, C. (2000). Data strip mining for the virtual design of pharmaceuticals with neural networks. IEEE Transactions on Neural Networks, 11(3), 668-679. doi:10.1109/72.846738Üstün, B., Melssen, W. J., & Buydens, L. M. C. (2007). Visualisation and interpretation of Support Vector Regression models. Analytica Chimica Acta, 595(1-2), 299-309. doi:10.1016/j.aca.2007.03.023Krooshof, P. W. T., Üstün, B., Postma, G. J., & Buydens, L. M. C. (2010). Visualization and Recovery of the (Bio)chemical Interesting Variables in Data Analysis with Support Vector Machine Classification. Analytical Chemistry, 82(16), 7000-7007. doi:10.1021/ac101338yGOWER, J. C., & HARDING, S. A. (1988). Nonlinear biplots. Biometrika, 75(3), 445-455. doi:10.1093/biomet/75.3.445Postma, G. J., Krooshof, P. W. T., & Buydens, L. M. C. (2011). Opening the kernel of kernel partial least squares and support vector machines. Analytica Chimica Acta, 705(1-2), 123-134. doi:10.1016/j.aca.2011.04.025Smolinska, A., Blanchet, L., Coulier, L., Ampt, K. A. M., Luider, T., Hintzen, R. Q., … Buydens, L. M. C. (2012). Interpretation and Visualization of Non-Linear Data Fusion in Kernel Space: Study on Metabolomic Characterization of Progression of Multiple Sclerosis. PLoS ONE, 7(6), e38163. doi:10.1371/journal.pone.0038163Camacho, J., Picó, J., & Ferrer, A. (2008). Bilinear modelling of batch processes. Part I: theoretical discussion. Journal of Chemometrics, 22(5), 299-308. doi:10.1002/cem.1113Wold, S., Kettaneh-Wold, N., MacGregor, J. F., & Dunn, K. G. (2009). Batch Process Modeling and MSPC. Comprehensive Chemometrics, 163-197. doi:10.1016/b978-044452701-1.00108-3Nomikos, P., & MacGregor, J. F. (1995). Multivariate SPC Charts for Monitoring Batch Processes. Technometrics, 37(1), 41-59. doi:10.1080/00401706.1995.10485888García-Muñoz, S., Kourti, T., MacGregor, J. F., Mateos, A. G., & Murphy, G. (2003). Troubleshooting of an Industrial Batch Process Using Multivariate Methods. Industrial & Engineering Chemistry Research, 42(15), 3592-3601. doi:10.1021/ie0300023Pérez, N. F., Ferré, J., & Boqué, R. (2009). Calculation of the reliability of classification in discriminant partial least-squares binary classification. Chemometrics and Intelligent Laboratory Systems, 95(2), 122-128. doi:10.1016/j.chemolab.2008.09.005Lindgren, F., Hansen, B., Karcher, W., Sjöström, M., & Eriksson, L. (1996). Model validation by permutation tests: Applications to variable selection. Journal of Chemometrics, 10(5-6), 521-532. doi:10.1002/(sici)1099-128x(199609)10:5/63.0.co;2-jQuintás, G., Portillo, N., García-Cañaveras, J. C., Castell, J. V., Ferrer, A., & Lahoz, A. (2011). Chemometric approaches to improve PLSDA model outcome for predicting human non-alcoholic fatty liver disease using UPLC-MS as a metabolic profiling tool. Metabolomics, 8(1), 86-98. doi:10.1007/s11306-011-0292-5Courrieu, P. (2002). Straight monotonic embedding of data sets in Euclidean spaces. Neural Networks, 15(10), 1185-1196. doi:10.1016/s0893-6080(02)00091-

    Efficient Kernel Orthonormalized PLS for Remote Sensing Applications

    No full text

    Evaluación de algoritmos supervisados de extracción de características para clasificación de texturas

    Get PDF
    En este Proyecto Fin de Carrera se plantea el problema de la extracción de características, mediante distintos métodos, en el ámbito de la clasificación de texturas. Dichos métodos consisten en procesar el espectro de las imágenes por un banco de filtros para, a partir de ahí, extraer las características que más información proporcionen para la posterior fase de clasificación. Concretamente, se compararán dos métodos alternativos; uno de ellos ha sido ya ampliamente usado en clasificación de texturas, y sus prestaciones servirán como referencia; el otro, el cual es nuestro principal objeto de estudio, ha sido aplicado satisfactoriamente en la clasificación de géneros musicales, y se pretende su extrapolación para el problema de la clasificación de texturas. Dichos métodos son, respectivamente: • Extracción mediante un banco de filtros de Gabor, el cual es fijo, y está basado en el sistema de reconocimiento del cerebro humano. • Extracción mediante filtros variables, adaptados a la base de datos, obtenidos mediante un método de aprendizaje máquina supervisado denominado POPLS. Una vez establecido el sistema de clasificación para ambos métodos se evaluarán por separado sus prestaciones: tasa de aciertos y matriz de confusión, para determinar la viabilidad del método supervisado. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------This Dissertation presents the problem of feature extraction, by means of different methods, inside the scope of texture classification. Such methods consist of processing the images spectrum with a filter bank, in order to extract the most relevant information to achieve good classification rates. Specifically, two alternative methods will be compared; one of them has already been widely used in texture classification, and its features will serve as reference; the other one, which is our main object of study, has been satisfactorily applied to music genre classification, and our intent is its extrapolation to the field of texture classification. Said methods are, respectively: • Extraction by means of a Gabor filter bank, which is fixed and based on the human brain recognition system. • Extraction through variable filters, tuned to the database, which are obtained by means of a supervised machine learning method known as POPLS. Once the classification system for both methods has been established, their features will be evaluated separately: hit rate and confusion matrix, to determine the viability of the supervised method.Ingeniería de Telecomunicació

    Multi-Label Dimensionality Reduction

    Get PDF
    abstract: Multi-label learning, which deals with data associated with multiple labels simultaneously, is ubiquitous in real-world applications. To overcome the curse of dimensionality in multi-label learning, in this thesis I study multi-label dimensionality reduction, which extracts a small number of features by removing the irrelevant, redundant, and noisy information while considering the correlation among different labels in multi-label learning. Specifically, I propose Hypergraph Spectral Learning (HSL) to perform dimensionality reduction for multi-label data by exploiting correlations among different labels using a hypergraph. The regularization effect on the classical dimensionality reduction algorithm known as Canonical Correlation Analysis (CCA) is elucidated in this thesis. The relationship between CCA and Orthonormalized Partial Least Squares (OPLS) is also investigated. To perform dimensionality reduction efficiently for large-scale problems, two efficient implementations are proposed for a class of dimensionality reduction algorithms, including canonical correlation analysis, orthonormalized partial least squares, linear discriminant analysis, and hypergraph spectral learning. The first approach is a direct least squares approach which allows the use of different regularization penalties, but is applicable under a certain assumption; the second one is a two-stage approach which can be applied in the regularization setting without any assumption. Furthermore, an online implementation for the same class of dimensionality reduction algorithms is proposed when the data comes sequentially. A Matlab toolbox for multi-label dimensionality reduction has been developed and released. The proposed algorithms have been applied successfully in the Drosophila gene expression pattern image annotation. The experimental results on some benchmark data sets in multi-label learning also demonstrate the effectiveness and efficiency of the proposed algorithms.Dissertation/ThesisPh.D. Computer Science 201

    Novel chemometric proposals for advanced multivariate data analysis, processing and interpretation

    Full text link
    The present Ph.D. thesis, primarily conceived to support and reinforce the relation between academic and industrial worlds, was developed in collaboration with Shell Global Solutions (Amsterdam, The Netherlands) in the endeavour of applying and possibly extending well-established latent variable-based approaches (i.e. Principal Component Analysis - PCA - Partial Least Squares regression - PLS - or Partial Least Squares Discriminant Analysis - PLSDA) for complex problem solving not only in the fields of manufacturing troubleshooting and optimisation, but also in the wider environment of multivariate data analysis. To this end, novel efficient algorithmic solutions are proposed throughout all chapters to address very disparate tasks, from calibration transfer in spectroscopy to real-time modelling of streaming flows of data. The manuscript is divided into the following six parts, focused on various topics of interest: Part I - Preface, where an overview of this research work, its main aims and justification is given together with a brief introduction on PCA, PLS and PLSDA; Part II - On kernel-based extensions of PCA, PLS and PLSDA, where the potential of kernel techniques, possibly coupled to specific variants of the recently rediscovered pseudo-sample projection, formulated by the English statistician John C. Gower, is explored and their performance compared to that of more classical methodologies in four different applications scenarios: segmentation of Red-Green-Blue (RGB) images, discrimination of on-/off-specification batch runs, monitoring of batch processes and analysis of mixture designs of experiments; Part III - On the selection of the number of factors in PCA by permutation testing, where an extensive guideline on how to accomplish the selection of PCA components by permutation testing is provided through the comprehensive illustration of an original algorithmic procedure implemented for such a purpose; Part IV - On modelling common and distinctive sources of variability in multi-set data analysis, where several practical aspects of two-block common and distinctive component analysis (carried out by methods like Simultaneous Component Analysis - SCA - DIStinctive and COmmon Simultaneous Component Analysis - DISCO-SCA - Adapted Generalised Singular Value Decomposition - Adapted GSVD - ECO-POWER, Canonical Correlation Analysis - CCA - and 2-block Orthogonal Projections to Latent Structures - O2PLS) are discussed, a new computational strategy for determining the number of common factors underlying two data matrices sharing the same row- or column-dimension is described, and two innovative approaches for calibration transfer between near-infrared spectrometers are presented; Part V - On the on-the-fly processing and modelling of continuous high-dimensional data streams, where a novel software system for rational handling of multi-channel measurements recorded in real time, the On-The-Fly Processing (OTFP) tool, is designed; Part VI - Epilogue, where final conclusions are drawn, future perspectives are delineated, and annexes are included.La presente tesis doctoral, concebida principalmente para apoyar y reforzar la relación entre la academia y la industria, se desarrolló en colaboración con Shell Global Solutions (Amsterdam, Países Bajos) en el esfuerzo de aplicar y posiblemente extender los enfoques ya consolidados basados en variables latentes (es decir, Análisis de Componentes Principales - PCA - Regresión en Mínimos Cuadrados Parciales - PLS - o PLS discriminante - PLSDA) para la resolución de problemas complejos no sólo en los campos de mejora y optimización de procesos, sino también en el entorno más amplio del análisis de datos multivariados. Con este fin, en todos los capítulos proponemos nuevas soluciones algorítmicas eficientes para abordar tareas dispares, desde la transferencia de calibración en espectroscopia hasta el modelado en tiempo real de flujos de datos. El manuscrito se divide en las seis partes siguientes, centradas en diversos temas de interés: Parte I - Prefacio, donde presentamos un resumen de este trabajo de investigación, damos sus principales objetivos y justificaciones junto con una breve introducción sobre PCA, PLS y PLSDA; Parte II - Sobre las extensiones basadas en kernels de PCA, PLS y PLSDA, donde presentamos el potencial de las técnicas de kernel, eventualmente acopladas a variantes específicas de la recién redescubierta proyección de pseudo-muestras, formulada por el estadista inglés John C. Gower, y comparamos su rendimiento respecto a metodologías más clásicas en cuatro aplicaciones a escenarios diferentes: segmentación de imágenes Rojo-Verde-Azul (RGB), discriminación y monitorización de procesos por lotes y análisis de diseños de experimentos de mezclas; Parte III - Sobre la selección del número de factores en el PCA por pruebas de permutación, donde aportamos una guía extensa sobre cómo conseguir la selección de componentes de PCA mediante pruebas de permutación y una ilustración completa de un procedimiento algorítmico original implementado para tal fin; Parte IV - Sobre la modelización de fuentes de variabilidad común y distintiva en el análisis de datos multi-conjunto, donde discutimos varios aspectos prácticos del análisis de componentes comunes y distintivos de dos bloques de datos (realizado por métodos como el Análisis Simultáneo de Componentes - SCA - Análisis Simultáneo de Componentes Distintivos y Comunes - DISCO-SCA - Descomposición Adaptada Generalizada de Valores Singulares - Adapted GSVD - ECO-POWER, Análisis de Correlaciones Canónicas - CCA - y Proyecciones Ortogonales de 2 conjuntos a Estructuras Latentes - O2PLS). Presentamos a su vez una nueva estrategia computacional para determinar el número de factores comunes subyacentes a dos matrices de datos que comparten la misma dimensión de fila o columna y dos planteamientos novedosos para la transferencia de calibración entre espectrómetros de infrarrojo cercano; Parte V - Sobre el procesamiento y la modelización en tiempo real de flujos de datos de alta dimensión, donde diseñamos la herramienta de Procesamiento en Tiempo Real (OTFP), un nuevo sistema de manejo racional de mediciones multi-canal registradas en tiempo real; Parte VI - Epílogo, donde presentamos las conclusiones finales, delimitamos las perspectivas futuras, e incluimos los anexos.La present tesi doctoral, concebuda principalment per a recolzar i reforçar la relació entre l'acadèmia i la indústria, es va desenvolupar en col·laboració amb Shell Global Solutions (Amsterdam, Països Baixos) amb l'esforç d'aplicar i possiblement estendre els enfocaments ja consolidats basats en variables latents (és a dir, Anàlisi de Components Principals - PCA - Regressió en Mínims Quadrats Parcials - PLS - o PLS discriminant - PLSDA) per a la resolució de problemes complexos no solament en els camps de la millora i optimització de processos, sinó també en l'entorn més ampli de l'anàlisi de dades multivariades. A aquest efecte, en tots els capítols proposem noves solucions algorítmiques eficients per a abordar tasques dispars, des de la transferència de calibratge en espectroscopia fins al modelatge en temps real de fluxos de dades. El manuscrit es divideix en les sis parts següents, centrades en diversos temes d'interès: Part I - Prefaci, on presentem un resum d'aquest treball de recerca, es donen els seus principals objectius i justificacions juntament amb una breu introducció sobre PCA, PLS i PLSDA; Part II - Sobre les extensions basades en kernels de PCA, PLS i PLSDA, on presentem el potencial de les tècniques de kernel, eventualment acoblades a variants específiques de la recentment redescoberta projecció de pseudo-mostres, formulada per l'estadista anglés John C. Gower, i comparem el seu rendiment respecte a metodologies més clàssiques en quatre aplicacions a escenaris diferents: segmentació d'imatges Roig-Verd-Blau (RGB), discriminació i monitorització de processos per lots i anàlisi de dissenys d'experiments de mescles; Part III - Sobre la selecció del nombre de factors en el PCA per proves de permutació, on aportem una guia extensa sobre com aconseguir la selecció de components de PCA a través de proves de permutació i una il·lustració completa d'un procediment algorítmic original implementat per a la finalitat esmentada; Part IV - Sobre la modelització de fonts de variabilitat comuna i distintiva en l'anàlisi de dades multi-conjunt, on discutim diversos aspectes pràctics de l'anàlisis de components comuns i distintius de dos blocs de dades (realitzat per mètodes com l'Anàlisi Simultània de Components - SCA - Anàlisi Simultània de Components Distintius i Comuns - DISCO-SCA - Descomposició Adaptada Generalitzada en Valors Singulars - Adapted GSVD - ECO-POWER, Anàlisi de Correlacions Canòniques - CCA - i Projeccions Ortogonals de 2 blocs a Estructures Latents - O2PLS). Presentem al mateix temps una nova estratègia computacional per a determinar el nombre de factors comuns subjacents a dues matrius de dades que comparteixen la mateixa dimensió de fila o columna, i dos plantejaments nous per a la transferència de calibratge entre espectròmetres d'infraroig proper; Part V - Sobre el processament i la modelització en temps real de fluxos de dades d'alta dimensió, on dissenyem l'eina de Processament en Temps Real (OTFP), un nou sistema de tractament racional de mesures multi-canal registrades en temps real; Part VI - Epíleg, on presentem les conclusions finals, delimitem les perspectives futures, i incloem annexos.Vitale, R. (2017). Novel chemometric proposals for advanced multivariate data analysis, processing and interpretation [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/90442TESI
    corecore