432 research outputs found
Machine learning in analytical chemistry: applying innovative data analysis methods using chromatographic techniques
Dissertação de mestrado em Chemical Analysis and Characterisation Techniques Chemical SciencesScientific and technological advances allowed the extraction of a growing quantity of knowledge
from the analysed samples by means of analytical techniques. Over the last few years, the dimensionality
of data that the most recent analytical techniques produce is so high, that its analysis is now called
megavariate analysis. Recently, the usage of machine learning tools in chemical data analysis have
allowed the extraction of relevant information from samples at a level which, until then, would just not
be possible.
The objective of this work consists in classifying manufacturing conditions of printed circuit
boards based on data acquired by SLE-HPLC-ESI-MS. As such, this dissertation is divided in two parts:
the first synthesizes the work taken to assure the analytical method produces data with adequate quality
in such a way the second part shows the development of predictive model using the previous acquired
data. At the same time, a data augmentation technique which, to the best of our knowledge, constitutes
the first time a data augmentation technique for classification problems using chromatographic data, has
been developed.
Best models’ results show precisions above 94% for all manufacturing conditions prediction.
Moreover, the developed data augmentation technique reports superior performances when compared
to three other data augmentation techniques.
In summary, the results show that, besides distinguishing classes with different chemical
compositions, it is possible to obtain information about which are the chemical compounds that
differentiate the classes. This information might be of significant importance for areas such as quality
control, food chemistry, botany and pharmaceutical industry.O constante avanço cientĂfico-tecnolĂłgico permitiu que, ao longo do Ăşltimo sĂ©culo, as tĂ©cnicas
de análise quĂmica extraĂssem cada vez mais conhecimento das amostras analisadas. Nos Ăşltimos anos,
a quantidade de dados que as mais recentes tĂ©cnicas analĂticas produzem possui uma dimensĂŁo tĂŁo
elevada que a sua análise é denominada de análise megavariacional. Recentemente, a aplicação de
ferramentas de machine learning em análises de dados quĂmicos tem permitido extrair informação
relevante das amostras analisadas que atĂ© recentemente nĂŁo era possĂvel.
Com isto em mente, o objetivo deste trabalho consiste em classificar condições de manufatura
de placas de circuito impresso tendo por base dados provenientes de análise por cromatografia lĂquida
acoplada a espetrometria de massa com extração sĂłlido-lĂquido. Desta forma, esta dissertação está
dividida em duas partes: a primeira sintetiza o trabalho efetuado para garantir que o método de análise
produz dados com qualidade adequada para que na segunda parte esses dados sejam usados para
construir modelos preditivos. Paralelamente, foi desenvolvida uma técnica de aumento de dados que,
até onde o nosso conhecimento vai, constitui a primeira técnica de aumento de dados desenvolvida para
problemas de classificação com dados provenientes de análises cromatográficas.
Os resultados dos melhores modelos mostram precisões superiores a 94% para a previsão de
todas as condições de manufatura. Adicionalmente, a técnica de aumento de dados desenvolvida mostra
desempenhos superiores comparativamente a outras técnicas de aumento de dados.
Em sĂntese, os resultados obtidos indicam que, para alĂ©m de distinguir classes com
composições quĂmicas diferentes, Ă© possĂvel adquirir informação sobre quais sĂŁo os compostos quĂmicos
que distinguem as classes em estudo. Esta informação pode vir a ter uma importância significativa em
áreas como controlo de qualidade, quĂmica alimentar e indĂşstria fito-farmacĂŞutica.Fundação para a CiĂŞncia e Tecnologia atravĂ©s do projeto POCI-01-0145-FEDER-029147 - PTDC/FIS-PAR/29147/2017 financiado por: OE/FCT, Lisboa 2020, Compete 2020 POCI, Portugal 2020 FEDE
A Chemometrics-driven Strategy for the Bioactivity Evaluation of Complex Multicomponent Systems and the Effective Selection of Bioactivity-predictive Chemical Combinations
Although understanding their chemical composition is vital for accurately predicting the bioactivity of multicomponent drugs, nutraceuticals, and foods, no analytical approach exists to easily predict the bioactivity of multicomponent systems from complex behaviors of multiple coexisting factors. We herein represent a metabolic profiling (MP) strategy for evaluating bioactivity in systems containing various small molecules. Composition profiles of diverse bioactive herbal samples from 21 green tea extract (GTE) panels were obtained by a high-throughput, non-targeted analytical procedure. This employed the matrix-assisted laser desorption ionization-mass spectrometry (MALDI-MS) technique, using 1,5-diaminonaphthalene (1,5-DAN) as the optical matrix for detecting GTE-derived components. Multivariate statistical analyses revealed differences among the GTEs in their antioxidant activity, oxygen radical absorbance capacity (ORAC). A reliable bioactivity-prediction model was constructed to predict the ORAC of diverse GTEs from their compositional balance. This chemometric procedure allowed the evaluation of GTE bioactivity by multicomponent rather than single-component information. The bioactivity could be easily evaluated by calculating the summed abundance of a few selected components that contributed most to constructing the prediction model. 1,5-DAN-MALDI-MS-MP, using diverse bioactive sample panels, represents a promising strategy for screening bioactivity-predictive multicomponent factors and selecting effective bioactivity-predictive chemical combinations for crude multicomponent systems
Application of AI in Modeling of Real System in Chemistry
In recent years, discharge of synthetic dye waste from different industries leading to aquatic and environmental pollution is a serious global problem of great concern. Hence, the removal of dye prediction plays an important role in wastewater management and conservation of nature. Artificial intelligence methods are popular owing due to its ease of use and high level of accuracy. This chapter proposes a detailed review of artificial intelligence-based removal dye prediction methods particularly multiple linear regression (MLR), artificial neural networks (ANNs), and least squares-support vector machine (LS-SVM). Furthermore, this chapter will focus on ensemble prediction models (EPMs) used for removal dye prediction. EPMs improve the prediction accuracy by integrating several prediction models. The principles, advantages, disadvantages, and applications of these artificial intelligence-based methods are explained in this chapter. Furthermore, future directions of the research on artificial intelligence-based removal dye prediction methods are discussed
Signal and data processing for machine olfaction and chemical sensing: A review
Signal and data processing are essential elements in electronic noses as well as in most chemical sensing instruments. The multivariate responses obtained by chemical sensor arrays require signal and data processing to carry out the fundamental tasks of odor identification (classification), concentration estimation (regression), and grouping of similar odors (clustering). In the last decade, important advances have shown that proper processing can improve the robustness of the instruments against diverse perturbations, namely, environmental variables, background changes, drift, etc. This article reviews the advances made in recent years in signal and data processing for machine olfaction and chemical sensing
Machine learning models for the prediction of pharmaceutical powder properties
Error on title page – year of award is 2023.Understanding how particle attributes affect the pharmaceutical manufacturing process performance remains a significant challenge for the industry, adding cost and time to the development of robust products and production routes. Tablet formation can be achieved by several techniques however, direct compression (DC) and granulation are the most widely used in industrial operations. DC is of particular interest as it offers lower-cost manufacturing and a streamlined process with fewer steps compared with other unit operations. However, to achieve the full potential benefits of DC for tablet manufacture, this places strict demands on material flow properties, blend uniformity, compactability, and lubrication, which need to be satisfied. DC is increasingly the preferred technique for pharmaceutical companies for oral solid dose manufacture, consequently making the flow prediction of pharmaceutical materials of increasing importance. Bulk properties are influenced by particle attributes, such as particle size and shape, which are defined during crystallization and/or milling processes. Currently, the suitability of raw materials and/or formulated blends for DC requires detailed characterization of the bulk properties. A key goal of digital design and Industry 4.0 concepts is through digital transformation of existing development steps be able to better predict properties whilst minimizing the amount of material and resources required to inform process selection during early- stage development.
The work presented in Chapter 4 focuses on developing machine learning (ML) models to predict powder flow behaviour of routine, widely available pharmaceutical materials. Several datasets comprising powder attributes (particle size, shape, surface area, surface energy, and bulk density) and flow properties (flow function coefficient) have been built, for pure compounds, binary mixtures, and multicomponent formulations. Using these datasets, different ML models, including traditional ML (random forest, support vector machines, k nearest neighbour, gradient boosting, AdaBoost, NaĂŻve Bayes, and logistic regression) classification and regression approaches, have been explored for the prediction of flow properties, via flow function coefficient. The models have been evaluated using multiple sampling methods and validated using external datasets, showing a performance over 80%, which is sufficiently high for their implementation to improve manufacturing efficiency. Finally, interpretability methods, namely SHAP (SHapley Additive exPlanaitions), have been used to understand the predictions of the machine learning models by determining how much each variable included in the training dataset has contributed to each final prediction.
Chapter 5 expanded on the work presented in Chapter 4 by demonstrating the applicability of ML models for the classification of the viability of pharmaceutical formulations for continuous DC via flow function coefficient on their powder flow. More than 100 formulations were included in this model and the particle size and particle shape of the active pharmaceutical ingredients (APIs), the flow function coefficient of the APIs, and the concentration of the components of the formulations were used to build the training dataset. The ML models were evaluated using different sampling techniques, such as bootstrap sampling and 10-fold cross-validation, achieving a precision of 90%.
Furthermore, Chapter 6 presents the comparison of two data-driven model approaches to predict powder flow: a Random Forest (RF) model and a Convolutional Neural Network (CNN) model. A total of 98 powders covering a wide range of particle sizes and shapes were assessed using static image analysis. The RF model was trained on the tabular data (particle size, aspect ratio, and circularity descriptors), and the CNN model was trained on the composite images. Both datasets were extracted from the same characterisation instrument. The data were split into training, testing, and validation sets. The results of the validation were used to compare the performance of the two approaches. The results revealed that both algorithms achieved a similar performance since the RF model and the CNN model achieved the same accuracy of 55%.
Finally, other particle and bulk properties, i.e., bulk density, surface area, and surface energy, and their impact on the manufacturability and bioavailability of the drug product are explored in Chapter 7. The bulk density models achieved a high performance of 82%, the surface area models achieved a performance of 80%, and finally, the surface-energy models achieved a performance of 60%. The results of the models presented in this chapter pave the way to unified guidelines moving towards end-to-end continuous manufacturing by linking the manufacturability requirements and the bioavailability requirements.Understanding how particle attributes affect the pharmaceutical manufacturing process performance remains a significant challenge for the industry, adding cost and time to the development of robust products and production routes. Tablet formation can be achieved by several techniques however, direct compression (DC) and granulation are the most widely used in industrial operations. DC is of particular interest as it offers lower-cost manufacturing and a streamlined process with fewer steps compared with other unit operations. However, to achieve the full potential benefits of DC for tablet manufacture, this places strict demands on material flow properties, blend uniformity, compactability, and lubrication, which need to be satisfied. DC is increasingly the preferred technique for pharmaceutical companies for oral solid dose manufacture, consequently making the flow prediction of pharmaceutical materials of increasing importance. Bulk properties are influenced by particle attributes, such as particle size and shape, which are defined during crystallization and/or milling processes. Currently, the suitability of raw materials and/or formulated blends for DC requires detailed characterization of the bulk properties. A key goal of digital design and Industry 4.0 concepts is through digital transformation of existing development steps be able to better predict properties whilst minimizing the amount of material and resources required to inform process selection during early- stage development.
The work presented in Chapter 4 focuses on developing machine learning (ML) models to predict powder flow behaviour of routine, widely available pharmaceutical materials. Several datasets comprising powder attributes (particle size, shape, surface area, surface energy, and bulk density) and flow properties (flow function coefficient) have been built, for pure compounds, binary mixtures, and multicomponent formulations. Using these datasets, different ML models, including traditional ML (random forest, support vector machines, k nearest neighbour, gradient boosting, AdaBoost, NaĂŻve Bayes, and logistic regression) classification and regression approaches, have been explored for the prediction of flow properties, via flow function coefficient. The models have been evaluated using multiple sampling methods and validated using external datasets, showing a performance over 80%, which is sufficiently high for their implementation to improve manufacturing efficiency. Finally, interpretability methods, namely SHAP (SHapley Additive exPlanaitions), have been used to understand the predictions of the machine learning models by determining how much each variable included in the training dataset has contributed to each final prediction.
Chapter 5 expanded on the work presented in Chapter 4 by demonstrating the applicability of ML models for the classification of the viability of pharmaceutical formulations for continuous DC via flow function coefficient on their powder flow. More than 100 formulations were included in this model and the particle size and particle shape of the active pharmaceutical ingredients (APIs), the flow function coefficient of the APIs, and the concentration of the components of the formulations were used to build the training dataset. The ML models were evaluated using different sampling techniques, such as bootstrap sampling and 10-fold cross-validation, achieving a precision of 90%.
Furthermore, Chapter 6 presents the comparison of two data-driven model approaches to predict powder flow: a Random Forest (RF) model and a Convolutional Neural Network (CNN) model. A total of 98 powders covering a wide range of particle sizes and shapes were assessed using static image analysis. The RF model was trained on the tabular data (particle size, aspect ratio, and circularity descriptors), and the CNN model was trained on the composite images. Both datasets were extracted from the same characterisation instrument. The data were split into training, testing, and validation sets. The results of the validation were used to compare the performance of the two approaches. The results revealed that both algorithms achieved a similar performance since the RF model and the CNN model achieved the same accuracy of 55%.
Finally, other particle and bulk properties, i.e., bulk density, surface area, and surface energy, and their impact on the manufacturability and bioavailability of the drug product are explored in Chapter 7. The bulk density models achieved a high performance of 82%, the surface area models achieved a performance of 80%, and finally, the surface-energy models achieved a performance of 60%. The results of the models presented in this chapter pave the way to unified guidelines moving towards end-to-end continuous manufacturing by linking the manufacturability requirements and the bioavailability requirements
Solid Phase Extraction Room Temperature Fluorescence Spectroscopy For The Direct Quantification Of Monohydroxy Metabolites Of Polycyclic Aromatic Hydrocarbons In Urine Samples
Polycyclic aromatic hydrocarbons (PAH) are important environmental pollutants generally formed during incomplete combustion of organic matter containing carbon and hydrogen. Introduced into the human body by adsorption through the skin, ingestion or inhalation, the biotransformation processes of PAH lead to the formation of multiple metabolites. Due to the short elimination lifetime from the body, the quantitative determination of monohydroxy-PAH (OH-PAH) in urine samples provides accurate information on recent exposure to environmental PAH. Urine analysis of OH-PAH with established methodology relies on sample clean-up and pre-concentration followed by chromatographic separation and quantification. Although chromatographic techniques provide reliable results in the analysis of OH-PAH, their experimental procedures are time consuming and expensive. Additional problems arise when laboratory procedures are scaled up to handle thousands of samples under mass screening conditions. Under the prospective of a sustainable environment, the large usage of organic solvents is one of the main limitations of current chromatographic methodology. It is within this context that new analytical approaches based on easy-to-use and cost-effective methodology become extremely relevant. This dissertation focuses on the development of screening methodology for the routine analysis of PAH metabolites in numerous samples. It explores the room-temperature fluorescence properties of six metabolites originating from parent PAH included in the Environmental Protection Agency priority pollutants list. 1- hydroxyfluorene, 1-hydroxypyrene, 6-hydroxychrysene, 9-hydroxyphenanthrene, 3- hydroxybenzo[a]pyrene and 4-hydroxybenzo[a]pyrene are used as model biomarkers to investigate the analytical potential of new methods based on solid-phase extraction (SPE) and iii room-temperature fluorescence (RTF) spectroscopy. Quantitative determination of metabolites is carried out either in the eluent extract[1, 2] or on the surface of extraction membranes[3, 4] . The direct determination – i.e., no chromatographic separation - of the six metabolites is based on the collection of excitation-emission matrices and synchronous fluorescence spectra
On-line monitoring of aqueous base metal solutions with transmittance spectrophotometry
Transmittance spectrophotometry was used to monitor copper, cobalt and zinc in solution in laboratory experiments. The samples simulated plant conditions encountered on the Skorpion zinc mine in Namibia and were prepared using a simplex centroid mixture design. Principal component, partial least squares and support vector regression models were calibrated from visible and near infrared absorption spectra. All models could accurately estimate the concentrations of all the metals in solution. Although these models were affected by nickel contamination, the Cu models were less sensitive to this contamination than the Co and Zn models. Likewise, elevated temperatures led to degradation of the calibrated models, particularly the Zn models. The effects of these conditions could be visualized by a linear discriminant score plot of the spectral data
Calibration Methods of Laser-Induced Breakdown Spectroscopy
Laser-induced breakdown spectroscopy (LIBS) has gained great attention over the past two decades due to its many advantages, such as needless sample preparation, capability of remote measurement and fast multielement simultaneous analysis. However, because of its inherent uncertainty features of plasma, it is still a big challenge for LIBS community worldwide to realize high sensitivity and accurate quantitative analysis. Currently, many chemometric analytical methods have been applied to LIBS calibration analysis, including univariate regression, multivariate regression, principal component regression (PCR), partial least squares regression (PLSR) and so on. In addition, appropriate sample and spectral pretreatment can effectively improve the analytical performance (i.e., limit of detection (LOD), accuracy and repeatability) of LIBS. In this chapter, we briefly summarize the progress of these calibration methods and their applications on LIBS and provide our recommendations
- …