Search CORE

56 research outputs found

Data-driven Soft Sensors in the Process Industry

Author: Abdi
Alhoniemi
Angelov
Angelov
Angelov
Arazo-Bravo
Atkeson
Bastin
Bauer
Bishop
Bogdan Gabrys
Bonne
Breiman
Bro
Casali
Chen
Chen
Chen
Chen
Chen
Choi
Chruy
Davies
Dayal
De Wolf
Desai
Devogelaere
Ding
Dong
Dong
Dote
Doyle
Dunia
Dunia
Dunia
Eriksson
Fellner
Fortuna
Fortuna
Frank
Freund
Funahashi
Gabrielsson
Gabrys
Gabrys
Gabrys
Gabrys
Gama
Geladi
Gomez
Gonzalez
Gonzalez
Goodwin
Gosset
Guyon
Han
Hastie
He
Hodge
Hotelling
Jackson
James
Jang
Jiang
Jolliffe
Jordaan
Jos de Assis
Kadlec
Kadlec
Kalos
Kampjarvi
Kittler
Kohavi
Kohonen
Kordon
Kourti
Kourti
Krogh
Kuncheva
Lee
Lee
Lee
Lee
Li
Li
Lin
Lin
Luo
Macias
Mandic
Marjanovic
Meleiro
Menold
Nauck
Neogi
Nomikos
Nomikos
Nomikos
Opitz
Park
Pearson
Pearson
Petr Kadlec
Poggio
Prasad
Principe
Qin
Qin
Qin
Qin
Radhakrishnan
Rnnar
Rong
Rotem
Ruta
Ruta
Schafer
Scheffer
Serneels
Sibylle Strandt
Stanimirova
Su
Tzanakou
van Sprang
van Sprang
Vapnik
Venkatasubramanian
Venkatasubramanian
Venkatasubramanian
Vilalta
Walczak
Walczak
Walczak
Wang
Wang
Wang
Wang
Warne
Weiss
Widmer
Wold
Wold
Wold
Wolpert
Yan
Yang
Zadeh
Zamprogna
Zamprogna
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/04/2009
Field of study

In the last two decades Soft Sensors established themselves as a valuable alternative to the traditional means for the acquisition of critical process variables, process monitoring and other tasks which are related to process control. This paper discusses characteristics of the process industry data which are critical for the development of data-driven Soft Sensors. These characteristics are common to a large number of process industry fields, like the chemical industry, bioprocess industry, steel industry, etc. The focus of this work is put on the data-driven Soft Sensors because of their growing popularity, already demonstrated usefulness and huge, though yet not completely realised, potential. A comprehensive selection of case studies covering the three most important Soft Sensor application fields, a general introduction to the most popular Soft Sensor modelling techniques as well as a discussion of some open issues in the Soft Sensor development and maintenance and their possible solutions are the main contributions of this work

Crossref

Bournemouth University Research Online

Subspace decomposition and critical phase selection based cumulative quality analysis for multiphase batch processes

Author: Camacho
Chen
Chiu
Choi
Chu
Dayal
Doan
Facco
González-Martínez
Kassidas
Liu
Liu
Lu
Lu
Mireia
Nomikos
Qin
Randolf
Reinikainen
Rodriguez
Smilde
Undey
Undey
Wasito
Wold
Xiao
Yang
Yao
Yao
Yao
Yu
Yu
Zhang
Zhang
Zhao
Zhao
Zhao
Zhao
Zhao
Zhao
Publication venue: 'Elsevier BV'
Publication date: 01/07/2017
Field of study

Quality analysis and prediction have been of great significance to ensure consistent and high product quality for chemical engineering processes. However, previous methods have rarely analyzed the cumulative quality effect which is of typical nature for batch processes. That is, with time development, the process variation will determine the final product quality in a cumulative manner. Besides, they can not get an early sense of the quality nature. In this paper, a quantitative index is defined which can check ahead of time whether the product quality result from accumulation or the addition of successive process variations and cumulative quality effect will be addressed for quality analysis and prediction of batch processes. Several crucial issues will be solved to explore the cumulative quality effect. First, a quality-relevant sequential phase partition method is proposed to separate multiple phases from batch processes by using fast search and find of density peaks clustering (FSFDP) algorithm. Second, after phase partition, a phase-wise cumulative quality analysis method is proposed based on subspace decomposition which can explore the non-repetitive quality-relevant information (NRQRI) from the process variation at each time within each phase. NRQRI refers to the quality-relevant process variations at each time that are orthogonal to those of previous time and thus represents complementary quality information which is the key index to cumulatively explain quality variations time-wise. Third, process-wise cumulative quality analysis is conducted where a critical phase selection strategy is developed to identify critical-to-cumulative-quality phases and quality predictions from critical phases are integrated to exclude influences of uncritical phases. By the two-level cumulative quality analysis (i.e., phase-wise and process-wise), it is feasible to judge whether the quality has the cumulative effect in advance and thus proper quality prediction model can be developed by identifying critical-to-cumulative-quality phases. The feasibility and performance of the proposed algorithm are illustrated by a typical chemical engineering process, injection molding

Crossref

White Rose Research Online

Model-based performance monitoring of batch processes

Author: McPherson Lindsay Anne
Publication venue
Publication date: 01/01/2008
Field of study

The use of batch processes is widespread across the manufacturing industries, dominating sectors such as pharmaceuticals, speciality chemicals and biochemicals. The main goal in batch production is to manufacture consistent, high quality batches with minimum rework or spoilage and also to achieve the optimum energy and feedstock usage. A common approach to monitoring a batch process to achieve this goal is to use a recipe-driven approach coupled with off-line laboratory analysis of the product. However, the large amount of data generated during batch manufacture mean that it is possible to monitor batch processes using a statistical model. Traditional multivariate statistical techniques such as principal component analysis and partial least squares were originally developed for use on continuous processes, which means they are less able to cope with the non-linear and dynamic behaviours inherent within a batch process without being adapted. Several approaches to dealing with batch behaviour in a multivariate framework have been proposed including multi-way principal component analysis. A more advanced approach designed to handle the typical characteristics of batch data is that of model-based principal component. It comprises of a mechanistic model combined with a multivariate statistical technique. More specifically, the technique uses a mechanistic model of the process to generate a set of residuals from the measured process variables. The theory being that the non-linear behaviour and the serial correlation in the process will be captured by the model, leaving a set of unstructured residuals to which principal component analysis (PCA) can be applied. This approach is benchmarked against the more standard approaches including multiway principal components analysis, batch observation level analysis. One limitation identified of the model-based approach is that if the mechanistic model of the process is of reduced complexity then the monitoring and fault detection abilities of the technique will be compromised. To address this issue, the model-based PCA technique has been extended to incorporate an additional error model which captures the differences between the mechanistic model and the process. This approach has been termed super model-based PCA (SMBPCA). A number of different error models are considered including partial least squares (linear, non-linear and dynamic), autoregressive with exogenous (ARX) variables model and dynamic canonical correlation analysis. Through the use of an exothermic batch reactor simulation, the SMBPCA approach has been investigated with respect to fault detection and capturing the non-linear and dynamic behaviour in the batch process. The robustness of the technique for application in an industrial situation is also discussed.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

OpenGrey Repository

Newcastle University eTheses

Data integration for the monitoring of batch processes in the pharmeceutical industry

Author: Wong Chris Wai Leung
Publication venue: Newcastle University
Publication date: 01/01/2007
Field of study

PhD ThesisAdvances in sensor technology has resulted in large amounts of data being available electronically. However, to utilise the potential of the data, there is a need to transform the data into knowledge to realise an enhanced understanding of the process. This thesis investigates a number of multivariate statistical projection techniques for the monitoring of batch fermentation and pharmaceutical processes. In the first part of the thesis, the traditional performance monitoring tools based on the approaches of Nomikos and MacGregor (1994) and Wold et al. (1998) are introduced. Additionally, the application of data scaling as a data pre-treatment step for batch processes is examined and it is observed that it has a significant impact on monitoring performance. Based on the advantages and limitations of these techniques, an alternative methodology is proposed and applied to a simulated penicillin fermentation process. The approach is compared with existing techniques using two metrics, false alarm rate and out-ofcontrol average run length. A further manufacturing challenge facing the pharmaceutical industry is to understand the differences in the performance of a product which is manufactured at two or more sites. A retrospective multi-site monitoring model is developed utilising a pooled sample variancecovariance methodology of the two sites. The results of this approach are compared with a number of techniques that have been previously reported in the literature for the integration of data from two or more sources. The latter part of the thesis focuses on data integration using multi-block analysis. Several blocks of data can be analysed simultaneously to allow the inter- and intra- block relationships to be extracted. The methodology of multi-block Principal Component Analysis (MBPCA) is initially reviewed. To enhance the sensitivity of the algorithm, wavelet analysis is incorporated within the MBPCA framework. The fundamental advantage of wavelet analysis is its ability to process a signal at different scales so that both the global features and the localised details of a signal can be studied simultaneously. Both existing and the modified approach are applied to data generated from an experiment conducted in a batch mini-plant and that was monitored by both physical sensors and on-line UV-Visible spectrometer. The performance of the integrated approaches is benchmarked against the individual process and spectral monitoring models as well as examining their fault detection ability on two additional batches with pre-designed process deviations.Engineering and Physical Sciences Research Council (EPSRC: The Overseas Research Students Award Scheme (ORSAS): The Centre for Process Analytics and Control Technology (CPACT)

Newcastle University eTheses

Data integration for the monitoring of batch processes in the pharmaceutical industry

Author: Wong Chris Wai Leung
Publication venue
Publication date: 01/01/2007
Field of study

Advances in sensor technology has resulted in large amounts of data being available electronically. However, to utilise the potential of the data, there is a need to transform the data into knowledge to realise an enhanced understanding of the process. This thesis investigates a number of multivariate statistical projection techniques for the monitoring of batch fermentation and pharmaceutical processes. In the first part of the thesis, the traditional performance monitoring tools based on the approaches of Nomikos and MacGregor (1994) and Wold et al. (1998) are introduced. Additionally, the application of data scaling as a data pre-treatment step for batch processes is examined and it is observed that it has a significant impact on monitoring performance. Based on the advantages and limitations of these techniques, an alternative methodology is proposed and applied to a simulated penicillin fermentation process. The approach is compared with existing techniques using two metrics, false alarm rate and out-ofcontrol average run length. A further manufacturing challenge facing the pharmaceutical industry is to understand the differences in the performance of a product which is manufactured at two or more sites. A retrospective multi-site monitoring model is developed utilising a pooled sample variancecovariance methodology of the two sites. The results of this approach are compared with a number of techniques that have been previously reported in the literature for the integration of data from two or more sources. The latter part of the thesis focuses on data integration using multi-block analysis. Several blocks of data can be analysed simultaneously to allow the inter- and intra- block relationships to be extracted. The methodology of multi-block Principal Component Analysis (MBPCA) is initially reviewed. To enhance the sensitivity of the algorithm, wavelet analysis is incorporated within the MBPCA framework. The fundamental advantage of wavelet analysis is its ability to process a signal at different scales so that both the global features and the localised details of a signal can be studied simultaneously. Both existing and the modified approach are applied to data generated from an experiment conducted in a batch mini-plant and that was monitored by both physical sensors and on-line UV-Visible spectrometer. The performance of the integrated approaches is benchmarked against the individual process and spectral monitoring models as well as examining their fault detection ability on two additional batches with pre-designed process deviations.EThOS - Electronic Theses Online ServiceEngineering and Physical Sciences Research Council (EPSRC) : Overseas Research Students Award Scheme (ORSAS) : Centre for Process Analytics and Control Technology (CPACT)GBUnited Kingdo

OpenGrey Repository

Temporal data fusion in multisensor systems using dynamic time warping

Author: Hsiao Ko Ming
Kumar Mohan
Venkatesh Svetha
West Geoff
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

Data acquired from multiple sensors can be fused at a variety of levels: the raw data level, the feature level, or the decision level. An additional dimension to the fusion process is temporal fusion, which is fusion of data or information acquired from multiple sensors of different types over a period of time. We propose a technique that can perform such temporal fusion. The core of the system is the fusion processor that uses Dynamic Time Warping (DTW) to perform temporal fusion. We evaluate the performance of the fusion system on two real world datasets: 1) accelerometer data acquired from performing two hand gestures and 2) NOKIA’s benchmark dataset for context recognition. The results of the first experiment show that the system can perform temporal fusion on both raw data and features derived from the raw data. The system can also recognize the same class of multisensor temporal sequences even though they have different lengths e.g. the same human gestures can be performed at different speeds. In addition, the fusion processor can infer decisions from the temporal sequences fast and accurately. The results of the second experiment show that the system can perform fusion on temporal sequences that have large dimensions and are a mix of discrete and continuous variables. The proposed fusion system achieved good classification rates efficiently in both experiments<br /

Deakin Research Online

Application of multivariate data analysis to improve and optimise industrial processes

Author: Raut Eesha Vasant
Publication venue: Newcastle University
Publication date: 01/01/2016
Field of study

EngDABB, who is the sponsoring company for this research work, is a global leader in power and automation technologies based in St. Neots, Cambridgeshire. The thesis discusses the work carried out on a portfolio of projects as a part of the Engineering Doctorate programme. Application of multivariate statistical process control was central to the successful implementation of the projects. The first project focussed on a Process Analytical Technology (PAT) software solution developed by ABB. The US Food and Drug Administration (FDA) have defined PAT as a process for designing, analysing and controlling manufacturing through timely measurements of Critical Quality Attributes (CQAs) of raw and in-process materials in order to achieve final product quality. The project’s overall objective was to enable seamless roll out and maintenance of chemometric models for at-line testing across multiple worldwide locations. The work presented in the thesis discusses a solution that allows global maintenance of at-line analyser measurement stations whilst providing ‘real time’ quality data at the right business level to enable more efficient business decisions. This required optimising the software during the preliminary stages which included developing hierarchical Partial Least Square (PLS) Models, maintaining a process within control and exporting data using the Model Data Exporter plug-in. Likewise the project involved development of a combination of test sets that could assess and improve the robustness of the product. Following the Factory Acceptance Test (FAT) and Site Acceptance Test the product was successfully commissioned at customer site. The second project investigated a recurring uncharacteristic event in the polymerisation process. This unusual phenomenon led to downgrading of the batch further causing a loss of revenue. Previous investigations indicated that the most likely reason for this unusual behaviour was due to the occurrence of crystallisation in the polymerisation reactor. These batches were identified by monitoring a ‘kink’ in the heat up profile during the polymerisation process. The root cause of this crystallisation was initially examined by monitoring the rate of reaction and analysing the behaviour of one variable at a time. However, these approaches were unsuccessful to identify the underlying issue with the crystallised batches. This body of work illustrates a series of steps developed using multivariate analysis techniques to identify unusual batches in the polymer reactor. Exploratory data analysis using Principal Component Analysis (PCA) and Multi-way Principal Component Analysis (MPCA) was performed on the historic batch data (quality, process and Overall Equipment Effectiveness (OEE)) to identify ii the root cause of the problem and develop a well defined method that can be used by the operators to identify abnormal batches

Newcastle University eTheses

REAL-TIME DATA MINING FOR PROCESS OPERATIONS USING GRAPHICS PROCESSING UNIT (GPU)-BASED HIGH PERFORMANCE COMPUTING

Author: LAU MAI CHAN
Publication venue
Publication date: 11/08/2014
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS

Novel chemometric proposals for advanced multivariate data analysis, processing and interpretation

Author: Vitale Raffaele
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 03/11/2017
Field of study

The present Ph.D. thesis, primarily conceived to support and reinforce the relation between academic and industrial worlds, was developed in collaboration with Shell Global Solutions (Amsterdam, The Netherlands) in the endeavour of applying and possibly extending well-established latent variable-based approaches (i.e. Principal Component Analysis - PCA - Partial Least Squares regression - PLS - or Partial Least Squares Discriminant Analysis - PLSDA) for complex problem solving not only in the fields of manufacturing troubleshooting and optimisation, but also in the wider environment of multivariate data analysis. To this end, novel efficient algorithmic solutions are proposed throughout all chapters to address very disparate tasks, from calibration transfer in spectroscopy to real-time modelling of streaming flows of data. The manuscript is divided into the following six parts, focused on various topics of interest: Part I - Preface, where an overview of this research work, its main aims and justification is given together with a brief introduction on PCA, PLS and PLSDA; Part II - On kernel-based extensions of PCA, PLS and PLSDA, where the potential of kernel techniques, possibly coupled to specific variants of the recently rediscovered pseudo-sample projection, formulated by the English statistician John C. Gower, is explored and their performance compared to that of more classical methodologies in four different applications scenarios: segmentation of Red-Green-Blue (RGB) images, discrimination of on-/off-specification batch runs, monitoring of batch processes and analysis of mixture designs of experiments; Part III - On the selection of the number of factors in PCA by permutation testing, where an extensive guideline on how to accomplish the selection of PCA components by permutation testing is provided through the comprehensive illustration of an original algorithmic procedure implemented for such a purpose; Part IV - On modelling common and distinctive sources of variability in multi-set data analysis, where several practical aspects of two-block common and distinctive component analysis (carried out by methods like Simultaneous Component Analysis - SCA - DIStinctive and COmmon Simultaneous Component Analysis - DISCO-SCA - Adapted Generalised Singular Value Decomposition - Adapted GSVD - ECO-POWER, Canonical Correlation Analysis - CCA - and 2-block Orthogonal Projections to Latent Structures - O2PLS) are discussed, a new computational strategy for determining the number of common factors underlying two data matrices sharing the same row- or column-dimension is described, and two innovative approaches for calibration transfer between near-infrared spectrometers are presented; Part V - On the on-the-fly processing and modelling of continuous high-dimensional data streams, where a novel software system for rational handling of multi-channel measurements recorded in real time, the On-The-Fly Processing (OTFP) tool, is designed; Part VI - Epilogue, where final conclusions are drawn, future perspectives are delineated, and annexes are included.La presente tesis doctoral, concebida principalmente para apoyar y reforzar la relación entre la academia y la industria, se desarrolló en colaboración con Shell Global Solutions (Amsterdam, Países Bajos) en el esfuerzo de aplicar y posiblemente extender los enfoques ya consolidados basados en variables latentes (es decir, Análisis de Componentes Principales - PCA - Regresión en Mínimos Cuadrados Parciales - PLS - o PLS discriminante - PLSDA) para la resolución de problemas complejos no sólo en los campos de mejora y optimización de procesos, sino también en el entorno más amplio del análisis de datos multivariados. Con este fin, en todos los capítulos proponemos nuevas soluciones algorítmicas eficientes para abordar tareas dispares, desde la transferencia de calibración en espectroscopia hasta el modelado en tiempo real de flujos de datos. El manuscrito se divide en las seis partes siguientes, centradas en diversos temas de interés: Parte I - Prefacio, donde presentamos un resumen de este trabajo de investigación, damos sus principales objetivos y justificaciones junto con una breve introducción sobre PCA, PLS y PLSDA; Parte II - Sobre las extensiones basadas en kernels de PCA, PLS y PLSDA, donde presentamos el potencial de las técnicas de kernel, eventualmente acopladas a variantes específicas de la recién redescubierta proyección de pseudo-muestras, formulada por el estadista inglés John C. Gower, y comparamos su rendimiento respecto a metodologías más clásicas en cuatro aplicaciones a escenarios diferentes: segmentación de imágenes Rojo-Verde-Azul (RGB), discriminación y monitorización de procesos por lotes y análisis de diseños de experimentos de mezclas; Parte III - Sobre la selección del número de factores en el PCA por pruebas de permutación, donde aportamos una guía extensa sobre cómo conseguir la selección de componentes de PCA mediante pruebas de permutación y una ilustración completa de un procedimiento algorítmico original implementado para tal fin; Parte IV - Sobre la modelización de fuentes de variabilidad común y distintiva en el análisis de datos multi-conjunto, donde discutimos varios aspectos prácticos del análisis de componentes comunes y distintivos de dos bloques de datos (realizado por métodos como el Análisis Simultáneo de Componentes - SCA - Análisis Simultáneo de Componentes Distintivos y Comunes - DISCO-SCA - Descomposición Adaptada Generalizada de Valores Singulares - Adapted GSVD - ECO-POWER, Análisis de Correlaciones Canónicas - CCA - y Proyecciones Ortogonales de 2 conjuntos a Estructuras Latentes - O2PLS). Presentamos a su vez una nueva estrategia computacional para determinar el número de factores comunes subyacentes a dos matrices de datos que comparten la misma dimensión de fila o columna y dos planteamientos novedosos para la transferencia de calibración entre espectrómetros de infrarrojo cercano; Parte V - Sobre el procesamiento y la modelización en tiempo real de flujos de datos de alta dimensión, donde diseñamos la herramienta de Procesamiento en Tiempo Real (OTFP), un nuevo sistema de manejo racional de mediciones multi-canal registradas en tiempo real; Parte VI - Epílogo, donde presentamos las conclusiones finales, delimitamos las perspectivas futuras, e incluimos los anexos.La present tesi doctoral, concebuda principalment per a recolzar i reforçar la relació entre l'acadèmia i la indústria, es va desenvolupar en col·laboració amb Shell Global Solutions (Amsterdam, Països Baixos) amb l'esforç d'aplicar i possiblement estendre els enfocaments ja consolidats basats en variables latents (és a dir, Anàlisi de Components Principals - PCA - Regressió en Mínims Quadrats Parcials - PLS - o PLS discriminant - PLSDA) per a la resolució de problemes complexos no solament en els camps de la millora i optimització de processos, sinó també en l'entorn més ampli de l'anàlisi de dades multivariades. A aquest efecte, en tots els capítols proposem noves solucions algorítmiques eficients per a abordar tasques dispars, des de la transferència de calibratge en espectroscopia fins al modelatge en temps real de fluxos de dades. El manuscrit es divideix en les sis parts següents, centrades en diversos temes d'interès: Part I - Prefaci, on presentem un resum d'aquest treball de recerca, es donen els seus principals objectius i justificacions juntament amb una breu introducció sobre PCA, PLS i PLSDA; Part II - Sobre les extensions basades en kernels de PCA, PLS i PLSDA, on presentem el potencial de les tècniques de kernel, eventualment acoblades a variants específiques de la recentment redescoberta projecció de pseudo-mostres, formulada per l'estadista anglés John C. Gower, i comparem el seu rendiment respecte a metodologies més clàssiques en quatre aplicacions a escenaris diferents: segmentació d'imatges Roig-Verd-Blau (RGB), discriminació i monitorització de processos per lots i anàlisi de dissenys d'experiments de mescles; Part III - Sobre la selecció del nombre de factors en el PCA per proves de permutació, on aportem una guia extensa sobre com aconseguir la selecció de components de PCA a través de proves de permutació i una il·lustració completa d'un procediment algorítmic original implementat per a la finalitat esmentada; Part IV - Sobre la modelització de fonts de variabilitat comuna i distintiva en l'anàlisi de dades multi-conjunt, on discutim diversos aspectes pràctics de l'anàlisis de components comuns i distintius de dos blocs de dades (realitzat per mètodes com l'Anàlisi Simultània de Components - SCA - Anàlisi Simultània de Components Distintius i Comuns - DISCO-SCA - Descomposició Adaptada Generalitzada en Valors Singulars - Adapted GSVD - ECO-POWER, Anàlisi de Correlacions Canòniques - CCA - i Projeccions Ortogonals de 2 blocs a Estructures Latents - O2PLS). Presentem al mateix temps una nova estratègia computacional per a determinar el nombre de factors comuns subjacents a dues matrius de dades que comparteixen la mateixa dimensió de fila o columna, i dos plantejaments nous per a la transferència de calibratge entre espectròmetres d'infraroig proper; Part V - Sobre el processament i la modelització en temps real de fluxos de dades d'alta dimensió, on dissenyem l'eina de Processament en Temps Real (OTFP), un nou sistema de tractament racional de mesures multi-canal registrades en temps real; Part VI - Epíleg, on presentem les conclusions finals, delimitem les perspectives futures, i incloem annexos.Vitale, R. (2017). Novel chemometric proposals for advanced multivariate data analysis, processing and interpretation [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/90442TESI

RiuNet

Development and Application of Chemometric Methods for Modelling Metabolic Spectral Profiles

Author: Fonville Judith Marlou
Fonville Judith Marlou
Publication venue: Surgery and Cancer, Biomolecular Medicine, Imperial College London
Publication date: 01/07/2011
Field of study

The interpretation of metabolic information is crucial to understanding the functioning of a biological system. Latent information about the metabolic state of a sample can be acquired using analytical chemistry methods, which generate spectroscopic profiles. Thus, nuclear magnetic resonance spectroscopy and mass spectrometry techniques can be employed to generate vast amounts of highly complex data on the metabolic content of biofluids and tissue, and this thesis discusses ways to process, analyse and interpret these data successfully. The evaluation of J -resolved spectroscopy in magnetic resonance profiling and the statistical techniques required to extract maximum information from the projections of these spectra are studied. In particular, data processing is evaluated, and correlation and regression methods are investigated with respect to enhanced model interpretation and biomarker identification. Additionally, it is shown that non-linearities in metabonomic data can be effectively modelled with kernel-based orthogonal partial least squares, for which an automated optimisation of the kernel parameter with nested cross-validation is implemented. The interpretation of orthogonal variation and predictive ability enabled by this approach are demonstrated in regression and classification models for applications in toxicology and parasitology. Finally, the vast amount of data generated with mass spectrometry imaging is investigated in terms of data processing, and the benefits of applying multivariate techniques to these data are illustrated, especially in terms of interpretation and visualisation using colour-coding of images. The advantages of methods such as principal component analysis, self-organising maps and manifold learning over univariate analysis are highlighted. This body of work therefore demonstrates new means of increasing the amount of biochemical information that can be obtained from a given set of samples in biological applications using spectral profiling. Various analytical and statistical methods are investigated and illustrated with applications drawn from diverse biomedical areas

Spiral - Imperial College Digital Repository