50 research outputs found

    Thermally Evolved & Separated Composition of Atmospheric Aerosols: Development and Application of Advanced Data Analysis Techniques for a Thermal Desorption Aerosol Gas Chromatograph (TAG)

    Get PDF
    Atmospheric organic aerosols are composed of thousands of individual compounds, interacting with climate through changes in aerosol optical properties and cloud interactions, and can be detrimental to human health. Aerosol mass spectrometry (MS) and gas chromatography (GC)-separated MS measurements have been utilized to better characterize the chemical composition of this material that comes from a variety of sources and experiences continuous oxidation while in the atmosphere. This dissertation describes the development of a novel rapid data analysis method for grouping of major components within chromatography-separated measurements and first application using thermal desorption aerosol gas chromatograph (TAG) – MS data. Chromatograms are binned and inserted directly into a positive matrix factorization (PMF) analysis to determine major contributing components, eliminating the need for manual compound integrations of hundreds of resolved molecules, and incorporating the entirety of the eluting MS signal, including Unresolved Complex Mixtures (UCM) and decomposition products that are often ignored in traditional GC-MS analysis. Binned GC-MS data has three dimensions: (1) mass spectra index m/z, (2) bin number, and (3) sample number. PMF output is composed of two dimensions; factor profiles and factor time series. The specific arrangement of the input data (three dimensions of variation structured as a two dimensional matrix) in a two dimensional PMF analysis affects the structure of the PMF profiles and time series output. If mass spectra index is in the profile dimension, and bin number and sample number are in the time series dimension, PMF groups components into factors with similar mass spectra, such as major contributing individual compounds, UCM with similar functional composition, and homologous compound series. This type of PMF analysis is described as the binning method for chromatogram deconvolution, and is presented in Chapter 2. If the sample number is in the time series dimension, and the bin number and mass spectra index, arranged as mass spectra resolved retention time/chromatogram (bin number), are in the profile dimension, PMF groups components with similar time series trends. This type of PMF analysis is described as binning method for source apportionment, and is described in Chapter 3. The binning methods are compared to traditional compound integration methods using previously-collected hourly ambient samples from Riverside, CA during the 2005 Study of Organic Aerosols at Riverside (SOAR) field campaign, as discussed in Chapters 2-3. Further application of the binning method for source apportionment is performed on newly acquired hourly TAG data from East St. Louis, IL, operated as part of the 2013 St. Louis Air Quality Regional Study (SLAQRS). Major sources of biogenic secondary organic aerosol (SOA), anthropogenic primary organic aerosol (POA) were identified, as described in detail in Chapter 4. Finally, our PMF separation method was tested for reliability using primary and secondary sources in a controlled laboratory system. As shown in Chapter 5, we find that for application of PMF on receptor measurements, high signal intensity and unique measurement profiles, like those found in TAG chromatograms, are keys to successful source apportionment. The binning method with component separation by PMF may be a valuable analysis technique for other complex data sets that incorporate measurements (e.g., mass spectrometry, spectroscopy, etc.) with additional separations (e.g., volatility, hygroscopicity, electrical mobility, etc.)

    Systematic reduction of Hyperspectral Images for high-throughput Plastic Characterization

    Full text link
    Hyperspectral Imaging (HSI) combines microscopy and spectroscopy to assess the spatial distribution of spectroscopically active compounds in objects, and has diverse applications in food quality control, pharmaceutical processes, and waste sorting. However, due to the large size of HSI datasets, it can be challenging to analyze and store them within a reasonable digital infrastructure, especially in waste sorting where speed and data storage resources are limited. Additionally, as with most spectroscopic data, there is significant redundancy, making pixel and variable selection crucial for retaining chemical information. Recent high-tech developments in chemometrics enable automated and evidence-based data reduction, which can substantially enhance the speed and performance of Non-Negative Matrix Factorization (NMF), a widely used algorithm for chemical resolution of HSI data. By recovering the pure contribution maps and spectral profiles of distributed compounds, NMF can provide evidence-based sorting decisions for efficient waste management. To improve the quality and efficiency of data analysis on hyperspectral imaging (HSI) data, we apply a convex-hull method to select essential pixels and wavelengths and remove uninformative and redundant information. This process minimizes computational strain and effectively eliminates highly mixed pixels. By reducing data redundancy, data investigation and analysis become more straightforward, as demonstrated in both simulated and real HSI data for plastic sorting

    Signal Processing Methods for Capillary Electrophoresis

    Get PDF

    Spectroscopy as Process Analytical Technology for Preparative Protein Purification

    Get PDF

    Data processing for Life Sciences measurements with hyphenated Gas Chromatography-Ion Mobility Spectrometry

    Get PDF
    [eng] Recent progress in analytical chemistry instrumentation has increased the amount of data available for analysis. This progress has been encompassed by computational improvements, that have enabled new possibilities to analyze larger amounts of data. These two factors have allowed to analyze more complex samples in multiple life science fields, such as biology, medicine, pharmacology, or food science. One of the techniques that has benefited from these improvements is Gas Chromatography - Ion Mobility Spectrometry (GC-IMS). This technique is useful for the detection of Volatile Organic Compounds (VOCs) in complex samples. Ion Mobility Spectrometry is an analytical technique for characterizing chemical substances based on the velocity of gas-phase ions in an electric field. It is able to detect trace levels of volatile chemicals reaching for some analytes ppb concentrations. While the instrument has moderate selectivity it is very fast in the analysis, as an ion mobility spectrum can be acquired in tenths of milliseconds. As it operates at ambient pressure, it is found not only as laboratory instrumentation but also in-site, to perform screening applications. For instance it is often used in airports for the detection of drugs and explosives. To enhance the selectivity of the IMS, especially for the analysis of complex samples, a gas chromatograph can be used for sample pre-separation at the expense of the length of the analysis. While there is better instrumentation and more computational power, better algorithms are still needed to exploit and extract all the information present in the samples. In particular, GC-IMS has not received much attention compared to other analytical techniques. In this work we address some of the data analysis issues for GC-IMS: With respect to the pre-processing, we explore several baseline estimation methods and we suggest a variation of Asymmetric Least Squares, a popular baseline estimation technique, that is able to cope with signals that present large peaks or large dynamic range. This baseline estimation method is used in Gas Chromatography - Mass Spectrometry signals as well, as it suits both techniques. Furthermore, we also characterize spectral misalignments in a several months long study, and propose an alignment method based on monotonic cubic splines for its correction. Based on the misalignment characterization we propose an optimal time span between consecutive calibrant samples. We the explore the usage of Multivariate Curve Resolution methods for the deconvolution of overlapped peaks and their extraction into pure components. We propose the use of a sliding window in the retention time axis to extract the pure components from smaller windows. The pure components are tracked through the windows. This approach is able to extract analytes with lower response with respect to MCR, compounds that have a low variance in the overall matrix Finally we apply some of these developments to real world applications, on a dataset for the prevention of fraud and quality control in the classification of olive oils, measured with GC-IMS, and on data for biomarker discovery of prostate cancer by analyzing the headspace of urine samples with a GC-MS instrument[cat] Els avenços recents en instrumentació química i el progrés en les capacitats computacionals obren noves possibilitats per l’anàlisi de dades provinents de diversos camps en l’àmbit de les ciències de la vida, com la biologia, la medicina o la ciència de l’alimentació. Una de les tècniques que s’ha beneficiat d’aquests avenços és la cromatografia de gasos – espectrometria de mobilitat d’ions (GC-IMS). Aquesta tècnica és útil per detectar compostos orgànics volàtils en mostres complexes. L’IMS és una tècnica analítica per caracteritzar substàncies químiques basada en la velocitat d’ions en fase gasosa en un camp elèctric, capaç de detectar traces d’alguns volàtils en concentracions de ppb ràpidament. Per augmentar-ne la selectivitat, un cromatògraf de gasos pot emprar-se per pre-separar la mostra, a expenses de la durada de l’anàlisi. Tot i disposar de millores en la instrumentació i més poder computacional, calen millors algoritmes per extreure tota la informació de les mostres. En particular, GC-IMS no ha rebut molta atenció en comparació amb altres tècniques analítiques. En aquest treball, tractem alguns problemes de l’anàlisi de dades de GC-IMS: Pel que fa al pre-processat, explorem algoritmes d’estimació de la línia de base i en proposem una millora, adaptada a les necessitats de l’instrument. Aquest algoritme també s’utilitza en mostres de cromatografia de gasos espectrometria de masses (GC-MS), en tant que s’adapta correctament a ambdues tècniques. Caracteritzem els desalineaments espectrals que es produeixen en un estudi de diversos mesos de durada, i proposem un mètode d’alineat basat en splines cúbics monotònics per a la seva correcció i un interval de temps òptim entre dues mostres calibrants. Explorem l’ús de mètodes de resolució multivariant de corbes (MCR) per a la deconvolució de pics solapats i la seva extracció en components purs. Proposem l’ús d’una finestra mòbil en el temps de retenció. Aquesta millora permet extreure més informació d’analits. Finalment utilitzem alguns d’aquests desenvolupaments a dues aplicacions: la prevenció de frau en la classificació d’olis d’oliva, mesurada amb GC-IMS i la cerca de biomarcadors de càncer de pròstata en volàtils de la orina, feta amb GC-MS

    Explicit–implicit mapping approach to nonlinear blind separation of sparse nonnegative dependent sources from a single mixture: pure component extraction from nonlinear mixture mass spectra

    Get PDF
    The nonlinear, nonnegative single-mixture blind source separation (BSS) problem consists of decomposing observed nonlinearly mixed multicomponent signal into nonnegative dependent component (source) signals. The problem is difficult and is a special case of the underdetermined BSS problem. However, it is practically relevant for the contemporary metabolic profiling of biological samples when only one sample is available for acquiring mass spectra ; afterwards, the pure components are extracted. Herein, we present a method for the blind separation of nonnegative dependent sources from a single, nonlinear mixture. First, an explicit feature map is used to map a single mixture into a pseudo multi-mixture. Second, an empirical kernel map is used for implicit mapping of a pseudo multi-mixture into a high-dimensional reproducible kernel Hilbert space (RKHS). Under sparse probabilistic conditions that were previously imposed on sources, the single-mixture nonlinear problem is converted into an equivalent linear, multiple-mixture problem that consists of the original sources and their higher order monomials. These monomials are suppressed by robust principal component analysis, hard-, soft- and trimmed thresholding. Sparseness constrained nonnegative matrix factorizations in RKHS yield sets of separated components. Afterwards, separated components are annotated with the pure components from the library using the maximal correlation criterion. The proposed method is depicted with a numerical example that is related to the extraction of 8 dependent components from 1 nonlinear mixture. The method is further demonstrated on 3 nonlinear chemical reactions of peptide synthesis in which 25, 19 and 28 dependent analytes are extracted from 1 nonlinear mixture mass spectra. The goal application of the proposed method is, in combination with other separation techniques, mass spectrometry-based non-targeted metabolic profiling, such as biomarker identification studies

    Limit of detection for second-order calibration methods

    Get PDF
    Analytical chemistry can be split into two main types, qualitative and quantitative. Most modern analytical chemistry is quantitative. Popular sensitivity to health issues is aroused by the mountains of government regulations that use science to, for instance, provide public health information to prevent disease caused by harmful exposure to toxic substances. The concept of the minimum amount of an analyte or compound that can be detected or analysed appears in many of these regulations (for example, to discard the presence of traces of toxic substances in foodstuffs) generally as a part of method validation aimed at reliably evaluating the validity of the measurements.The lowest quantity of a substance that can be distinguished from the absence of that substance (a blank value) is called the detection limit or limit of detection (LOD). Traditionally, in the context of simple measurements where the instrumental signal only depends on the amount of analyte, a multiple of the blank value is taken to calculate the LOD (traditionally, the blank value plus three times the standard deviation of the measurement). However, the increasing complexity of the data that analytical instruments can provide for incoming samples leads to situations in which the LOD cannot be calculated as reliably as before.Measurements, instruments and mathematical models can be classified according to the type of data they use. Tensorial theory provides a unified language that is useful for describing the chemical measurements, analytical instruments and calibration methods. Instruments that generate two-dimensional arrays of data are second-order instruments. A typical example is a spectrofluorometer, which provides a set of emission spectra obtained at different excitation wavelengths.The calibration methods used with each type of data have different features and complexity. In this thesis, the most commonly used calibration methods are reviewed, from zero-order (or univariate) to second-order (or multi-linears) calibration models. Second-order calibration models are treated in details since they have been applied in the thesis.Concretely, the following methods are described:- PARAFAC (Parallel Factor Analysis)- ITTFA (Iterative Target Transformation Analysis)- MCR-ALS (Multivariate Curve Resolution-Alternating Least Squares)- N-PLS (Multi-linear Partial Least Squares)Analytical methods should be validated. The validation process typically starts by defining the scope of the analytical procedure, which includes the matrix, target analyte(s), analytical technique and intended purpose. The next step is to identify the performance characteristics that must be validated, which may depend on the purpose of the procedure, and the experiments for determining them. Finally, validation results should be documented, reviewed and maintained (if not, the procedure should be revalidated) as long as the procedure is applied in routine work.The figures of merit of a chemical analytical process are 'those quantifiable terms which may indicate the extent of quality of the process. They include those terms that are closely related to the method and to the analyte (sensitivity, selectivity, limit of detection, limit of quantification, ...) and those which are concerned with the final results (traceability, uncertainty and representativity) (Inczédy et al., 1998). The aim of this thesis is to develop theoretical and practical strategies for calculating the limit of detection for complex analytical situations. Specifically, I focus on second-order calibration methods, i.e. when a matrix of data is available for each sample.The methods most often used for making detection decisions are based on statistical hypothesis testing and involve a choice between two hypotheses about the sample. The first hypothesis is the "null hypothesis": the sample is analyte-free. The second hypothesis is the "alternative hypothesis": the sample is not analyte-free. In the hypothesis test there are two possible types of decision errors. An error of the first type occurs when the signal for an analyte-free sample exceeds the critical value, leading one to conclude incorrectly that the sample contains a positive amount of the analyte. This type of error is sometimes called a "false positive". An error of the second type occurs if one concludes that a sample does not contain the analyte when it actually does and it is known as a "false negative". In zero-order calibration, this hypothesis test is applied to the confidence intervals of the calibration model to estimate the LOD as proposed by Hubaux and Vos (A. Hubaux, G. Vos, Anal. Chem. 42: 849-855, 1970).One strategy for estimating multivariate limits of detection is to transform the multivariate model into a univariate one. This strategy has been applied in this thesis in three practical applications:1. LOD for PARAFAC (Parallel Factor Analysis).2. LOD for ITTFA (Iterative Target Transformation Factor Analysis).3. LOD for MCR-ALS (Multivariate Curve Resolution - Alternating Least Squares)In addition, the thesis includes a theoretical contribution with the proposal of a sample-dependent LOD in the context of multivariate (PLS) and multi-linear (N-PLS) Partial Least Squares.La Química Analítica es pot dividir en dos tipus d'anàlisis, l'anàlisi quantitativa i l'anàlisi qualitativa. La gran part de la química analítica moderna és quantitativa i fins i tot els govern fan ús d'aquesta ciència per establir regulacions que controlen, per exemple, nivells d'exposició a substàncies tòxiques que poden afectar la salut pública. El concepte de mínima quantitat d'un analit o component que es pot detectar apareix en moltes d'aquestes regulacions, en general com una part de la validació dels mètodes per tal de garantir la qualitat i la validesa dels resultats.La mínima quantitat d'una substància que pot ser diferenciada de l'absència d'aquesta substància (el que es coneix com un blanc) s'anomena límit de detecció (limit of detection, LOD). En procediments on es treballa amb mesures analítiques que són degudes només a la quantitat d'analit present a la mostra (situació d'ordre zero) el LOD es pot calcular com un múltiple de la mesura del blanc (tradicionalment, 3 vegades la desviació d'aquesta mesura). Tanmateix, l'evolució dels instruments analítics i la complexitat creixent de les dades que generen, porta a situacions en les que el LOD no es pot calcular fiablement d'una forma tan senzilla. Les mesures, els instruments i els models de calibratge es poden classificar en funció del tipus de dades que utilitzen. La Teoria Tensorial s'ha utilitzat en aquesta tesi per fer aquesta classificació amb un llenguatge útil i unificat. Els instruments que generen dades en dues dimensions s'anomenen instruments de segon ordre i un exemple típic és l'espectrofluorímetre d'excitació-emissió, que proporciona un conjunt d'espectres d'emissió obtinguts a diferents longituds d'ona d'excitació.Els mètodes de calibratge emprats amb cada tipus de dades tenen diferents característiques i complexitat. En aquesta tesi, es fa una revisió dels models de calibratge més habituals d'ordre zero (univariants), de primer ordre (multivariants) i de segon ordre (multilinears). Els mètodes de segon ordre estan tractats amb més detall donat que són els que s'han emprat en les aplicacions pràctiques portades a terme. Concretament es descriuen:- PARAFAC (Parallel Factor Analysis)- ITTFA (Iterative Target Transformation Analysis)- MCR-ALS (Multivariate Curve Resolution-Alternating Least Squares)- N-PLS (Multi-linear Partial Least Squares)Com s'ha avançat al principi, els mètodes analítics s'han de validar. El procés de validació inclou la definició dels límits d'aplicació del procediment analític (des del tipus de mostres o matrius fins l'analit o components d'interès, la tècnica analítica i l'objectiu del procediment). La següent etapa consisteix en identificar i estimar els paràmetres de qualitat (figures of merit, FOM) que s'han de validar per, finalment, documentar els resultats de la validació i mantenir-los mentre sigui aplicable el procediment descrit.Algunes FOM dels processos químics de mesura són: sensibilitat, selectivitat, límit de detecció, exactitud, precisió, etc. L'objectiu principal d'aquesta tesi és desenvolupar estratègies teòriques i pràctiques per calcular el límit de detecció per problemes analítics complexos. Concretament, està centrat en els mètodes de calibratge que treballen amb dades de segon ordre.Els mètodes més emprats per definir criteris de detecció estan basats en proves d'hipòtesis i impliquen una elecció entre dues hipòtesis sobre la mostra. La primera hipòtesi és la hipòtesi nul·la: a la mostra no hi ha analit. La segona hipòtesis és la hipòtesis alternativa: a la mostra hi ha analit. En aquest context, hi ha dos tipus d'errors en la decisió. L'error de primer tipus té lloc quan es determina que la mostra conté analit quan no en té i la probabilitat de cometre l'error de primer tipus s'anomena fals positiu. L'error de segon tipus té lloc quan es determina que la mostra no conté analit quan en realitat si en conté i la probabilitat d'aquest error s'anomena fals negatiu. En calibratges d'ordre zero, aquesta prova d'hipòtesi s'aplica als intervals de confiança de la recta de calibratge per calcular el LOD mitjançant les fórmules d'Hubaux i Vos (A. Hubaux, G. Vos, Anal. Chem. 42: 849-855, 1970)Una estratègia per a calcular límits de detecció quan es treballa amb dades de segon ordre es transformar el model multivariant en un model univariant. Aquesta estratègia s'ha fet servir en la tesi en tres aplicacions diferents::1. LOD per PARAFAC (Parallel Factor Analysis).2. LOD per ITTFA (Iterative Target Transformation Factor Analysis).3. LOD per MCR-ALS (Multivariate Curve Resolution - Alternating Least Squares)A més, la tesi inclou una contribució teòrica amb la proposta d'un LOD que és específic per cada mostra, en el context del mètode multivariant PLS i del multilinear N-PLS
    corecore