432 research outputs found

    Machine learning in analytical chemistry: applying innovative data analysis methods using chromatographic techniques

    Get PDF
    Dissertação de mestrado em Chemical Analysis and Characterisation Techniques Chemical SciencesScientific and technological advances allowed the extraction of a growing quantity of knowledge from the analysed samples by means of analytical techniques. Over the last few years, the dimensionality of data that the most recent analytical techniques produce is so high, that its analysis is now called megavariate analysis. Recently, the usage of machine learning tools in chemical data analysis have allowed the extraction of relevant information from samples at a level which, until then, would just not be possible. The objective of this work consists in classifying manufacturing conditions of printed circuit boards based on data acquired by SLE-HPLC-ESI-MS. As such, this dissertation is divided in two parts: the first synthesizes the work taken to assure the analytical method produces data with adequate quality in such a way the second part shows the development of predictive model using the previous acquired data. At the same time, a data augmentation technique which, to the best of our knowledge, constitutes the first time a data augmentation technique for classification problems using chromatographic data, has been developed. Best models’ results show precisions above 94% for all manufacturing conditions prediction. Moreover, the developed data augmentation technique reports superior performances when compared to three other data augmentation techniques. In summary, the results show that, besides distinguishing classes with different chemical compositions, it is possible to obtain information about which are the chemical compounds that differentiate the classes. This information might be of significant importance for areas such as quality control, food chemistry, botany and pharmaceutical industry.O constante avanço científico-tecnológico permitiu que, ao longo do último século, as técnicas de análise química extraíssem cada vez mais conhecimento das amostras analisadas. Nos últimos anos, a quantidade de dados que as mais recentes técnicas analíticas produzem possui uma dimensão tão elevada que a sua análise é denominada de análise megavariacional. Recentemente, a aplicação de ferramentas de machine learning em análises de dados químicos tem permitido extrair informação relevante das amostras analisadas que até recentemente não era possível. Com isto em mente, o objetivo deste trabalho consiste em classificar condições de manufatura de placas de circuito impresso tendo por base dados provenientes de análise por cromatografia líquida acoplada a espetrometria de massa com extração sólido-líquido. Desta forma, esta dissertação está dividida em duas partes: a primeira sintetiza o trabalho efetuado para garantir que o método de análise produz dados com qualidade adequada para que na segunda parte esses dados sejam usados para construir modelos preditivos. Paralelamente, foi desenvolvida uma técnica de aumento de dados que, até onde o nosso conhecimento vai, constitui a primeira técnica de aumento de dados desenvolvida para problemas de classificação com dados provenientes de análises cromatográficas. Os resultados dos melhores modelos mostram precisões superiores a 94% para a previsão de todas as condições de manufatura. Adicionalmente, a técnica de aumento de dados desenvolvida mostra desempenhos superiores comparativamente a outras técnicas de aumento de dados. Em síntese, os resultados obtidos indicam que, para além de distinguir classes com composições químicas diferentes, é possível adquirir informação sobre quais são os compostos químicos que distinguem as classes em estudo. Esta informação pode vir a ter uma importância significativa em áreas como controlo de qualidade, química alimentar e indústria fito-farmacêutica.Fundação para a Ciência e Tecnologia através do projeto POCI-01-0145-FEDER-029147 - PTDC/FIS-PAR/29147/2017 financiado por: OE/FCT, Lisboa 2020, Compete 2020 POCI, Portugal 2020 FEDE

    A Chemometrics-driven Strategy for the Bioactivity Evaluation of Complex Multicomponent Systems and the Effective Selection of Bioactivity-predictive Chemical Combinations

    Get PDF
    Although understanding their chemical composition is vital for accurately predicting the bioactivity of multicomponent drugs, nutraceuticals, and foods, no analytical approach exists to easily predict the bioactivity of multicomponent systems from complex behaviors of multiple coexisting factors. We herein represent a metabolic profiling (MP) strategy for evaluating bioactivity in systems containing various small molecules. Composition profiles of diverse bioactive herbal samples from 21 green tea extract (GTE) panels were obtained by a high-throughput, non-targeted analytical procedure. This employed the matrix-assisted laser desorption ionization-mass spectrometry (MALDI-MS) technique, using 1,5-diaminonaphthalene (1,5-DAN) as the optical matrix for detecting GTE-derived components. Multivariate statistical analyses revealed differences among the GTEs in their antioxidant activity, oxygen radical absorbance capacity (ORAC). A reliable bioactivity-prediction model was constructed to predict the ORAC of diverse GTEs from their compositional balance. This chemometric procedure allowed the evaluation of GTE bioactivity by multicomponent rather than single-component information. The bioactivity could be easily evaluated by calculating the summed abundance of a few selected components that contributed most to constructing the prediction model. 1,5-DAN-MALDI-MS-MP, using diverse bioactive sample panels, represents a promising strategy for screening bioactivity-predictive multicomponent factors and selecting effective bioactivity-predictive chemical combinations for crude multicomponent systems

    Application of AI in Modeling of Real System in Chemistry

    Get PDF
    In recent years, discharge of synthetic dye waste from different industries leading to aquatic and environmental pollution is a serious global problem of great concern. Hence, the removal of dye prediction plays an important role in wastewater management and conservation of nature. Artificial intelligence methods are popular owing due to its ease of use and high level of accuracy. This chapter proposes a detailed review of artificial intelligence-based removal dye prediction methods particularly multiple linear regression (MLR), artificial neural networks (ANNs), and least squares-support vector machine (LS-SVM). Furthermore, this chapter will focus on ensemble prediction models (EPMs) used for removal dye prediction. EPMs improve the prediction accuracy by integrating several prediction models. The principles, advantages, disadvantages, and applications of these artificial intelligence-based methods are explained in this chapter. Furthermore, future directions of the research on artificial intelligence-based removal dye prediction methods are discussed

    Signal and data processing for machine olfaction and chemical sensing: A review

    Get PDF
    Signal and data processing are essential elements in electronic noses as well as in most chemical sensing instruments. The multivariate responses obtained by chemical sensor arrays require signal and data processing to carry out the fundamental tasks of odor identification (classification), concentration estimation (regression), and grouping of similar odors (clustering). In the last decade, important advances have shown that proper processing can improve the robustness of the instruments against diverse perturbations, namely, environmental variables, background changes, drift, etc. This article reviews the advances made in recent years in signal and data processing for machine olfaction and chemical sensing

    Machine learning models for the prediction of pharmaceutical powder properties

    Get PDF
    Error on title page – year of award is 2023.Understanding how particle attributes affect the pharmaceutical manufacturing process performance remains a significant challenge for the industry, adding cost and time to the development of robust products and production routes. Tablet formation can be achieved by several techniques however, direct compression (DC) and granulation are the most widely used in industrial operations. DC is of particular interest as it offers lower-cost manufacturing and a streamlined process with fewer steps compared with other unit operations. However, to achieve the full potential benefits of DC for tablet manufacture, this places strict demands on material flow properties, blend uniformity, compactability, and lubrication, which need to be satisfied. DC is increasingly the preferred technique for pharmaceutical companies for oral solid dose manufacture, consequently making the flow prediction of pharmaceutical materials of increasing importance. Bulk properties are influenced by particle attributes, such as particle size and shape, which are defined during crystallization and/or milling processes. Currently, the suitability of raw materials and/or formulated blends for DC requires detailed characterization of the bulk properties. A key goal of digital design and Industry 4.0 concepts is through digital transformation of existing development steps be able to better predict properties whilst minimizing the amount of material and resources required to inform process selection during early- stage development. The work presented in Chapter 4 focuses on developing machine learning (ML) models to predict powder flow behaviour of routine, widely available pharmaceutical materials. Several datasets comprising powder attributes (particle size, shape, surface area, surface energy, and bulk density) and flow properties (flow function coefficient) have been built, for pure compounds, binary mixtures, and multicomponent formulations. Using these datasets, different ML models, including traditional ML (random forest, support vector machines, k nearest neighbour, gradient boosting, AdaBoost, Naïve Bayes, and logistic regression) classification and regression approaches, have been explored for the prediction of flow properties, via flow function coefficient. The models have been evaluated using multiple sampling methods and validated using external datasets, showing a performance over 80%, which is sufficiently high for their implementation to improve manufacturing efficiency. Finally, interpretability methods, namely SHAP (SHapley Additive exPlanaitions), have been used to understand the predictions of the machine learning models by determining how much each variable included in the training dataset has contributed to each final prediction. Chapter 5 expanded on the work presented in Chapter 4 by demonstrating the applicability of ML models for the classification of the viability of pharmaceutical formulations for continuous DC via flow function coefficient on their powder flow. More than 100 formulations were included in this model and the particle size and particle shape of the active pharmaceutical ingredients (APIs), the flow function coefficient of the APIs, and the concentration of the components of the formulations were used to build the training dataset. The ML models were evaluated using different sampling techniques, such as bootstrap sampling and 10-fold cross-validation, achieving a precision of 90%. Furthermore, Chapter 6 presents the comparison of two data-driven model approaches to predict powder flow: a Random Forest (RF) model and a Convolutional Neural Network (CNN) model. A total of 98 powders covering a wide range of particle sizes and shapes were assessed using static image analysis. The RF model was trained on the tabular data (particle size, aspect ratio, and circularity descriptors), and the CNN model was trained on the composite images. Both datasets were extracted from the same characterisation instrument. The data were split into training, testing, and validation sets. The results of the validation were used to compare the performance of the two approaches. The results revealed that both algorithms achieved a similar performance since the RF model and the CNN model achieved the same accuracy of 55%. Finally, other particle and bulk properties, i.e., bulk density, surface area, and surface energy, and their impact on the manufacturability and bioavailability of the drug product are explored in Chapter 7. The bulk density models achieved a high performance of 82%, the surface area models achieved a performance of 80%, and finally, the surface-energy models achieved a performance of 60%. The results of the models presented in this chapter pave the way to unified guidelines moving towards end-to-end continuous manufacturing by linking the manufacturability requirements and the bioavailability requirements.Understanding how particle attributes affect the pharmaceutical manufacturing process performance remains a significant challenge for the industry, adding cost and time to the development of robust products and production routes. Tablet formation can be achieved by several techniques however, direct compression (DC) and granulation are the most widely used in industrial operations. DC is of particular interest as it offers lower-cost manufacturing and a streamlined process with fewer steps compared with other unit operations. However, to achieve the full potential benefits of DC for tablet manufacture, this places strict demands on material flow properties, blend uniformity, compactability, and lubrication, which need to be satisfied. DC is increasingly the preferred technique for pharmaceutical companies for oral solid dose manufacture, consequently making the flow prediction of pharmaceutical materials of increasing importance. Bulk properties are influenced by particle attributes, such as particle size and shape, which are defined during crystallization and/or milling processes. Currently, the suitability of raw materials and/or formulated blends for DC requires detailed characterization of the bulk properties. A key goal of digital design and Industry 4.0 concepts is through digital transformation of existing development steps be able to better predict properties whilst minimizing the amount of material and resources required to inform process selection during early- stage development. The work presented in Chapter 4 focuses on developing machine learning (ML) models to predict powder flow behaviour of routine, widely available pharmaceutical materials. Several datasets comprising powder attributes (particle size, shape, surface area, surface energy, and bulk density) and flow properties (flow function coefficient) have been built, for pure compounds, binary mixtures, and multicomponent formulations. Using these datasets, different ML models, including traditional ML (random forest, support vector machines, k nearest neighbour, gradient boosting, AdaBoost, Naïve Bayes, and logistic regression) classification and regression approaches, have been explored for the prediction of flow properties, via flow function coefficient. The models have been evaluated using multiple sampling methods and validated using external datasets, showing a performance over 80%, which is sufficiently high for their implementation to improve manufacturing efficiency. Finally, interpretability methods, namely SHAP (SHapley Additive exPlanaitions), have been used to understand the predictions of the machine learning models by determining how much each variable included in the training dataset has contributed to each final prediction. Chapter 5 expanded on the work presented in Chapter 4 by demonstrating the applicability of ML models for the classification of the viability of pharmaceutical formulations for continuous DC via flow function coefficient on their powder flow. More than 100 formulations were included in this model and the particle size and particle shape of the active pharmaceutical ingredients (APIs), the flow function coefficient of the APIs, and the concentration of the components of the formulations were used to build the training dataset. The ML models were evaluated using different sampling techniques, such as bootstrap sampling and 10-fold cross-validation, achieving a precision of 90%. Furthermore, Chapter 6 presents the comparison of two data-driven model approaches to predict powder flow: a Random Forest (RF) model and a Convolutional Neural Network (CNN) model. A total of 98 powders covering a wide range of particle sizes and shapes were assessed using static image analysis. The RF model was trained on the tabular data (particle size, aspect ratio, and circularity descriptors), and the CNN model was trained on the composite images. Both datasets were extracted from the same characterisation instrument. The data were split into training, testing, and validation sets. The results of the validation were used to compare the performance of the two approaches. The results revealed that both algorithms achieved a similar performance since the RF model and the CNN model achieved the same accuracy of 55%. Finally, other particle and bulk properties, i.e., bulk density, surface area, and surface energy, and their impact on the manufacturability and bioavailability of the drug product are explored in Chapter 7. The bulk density models achieved a high performance of 82%, the surface area models achieved a performance of 80%, and finally, the surface-energy models achieved a performance of 60%. The results of the models presented in this chapter pave the way to unified guidelines moving towards end-to-end continuous manufacturing by linking the manufacturability requirements and the bioavailability requirements

    Solid Phase Extraction Room Temperature Fluorescence Spectroscopy For The Direct Quantification Of Monohydroxy Metabolites Of Polycyclic Aromatic Hydrocarbons In Urine Samples

    Get PDF
    Polycyclic aromatic hydrocarbons (PAH) are important environmental pollutants generally formed during incomplete combustion of organic matter containing carbon and hydrogen. Introduced into the human body by adsorption through the skin, ingestion or inhalation, the biotransformation processes of PAH lead to the formation of multiple metabolites. Due to the short elimination lifetime from the body, the quantitative determination of monohydroxy-PAH (OH-PAH) in urine samples provides accurate information on recent exposure to environmental PAH. Urine analysis of OH-PAH with established methodology relies on sample clean-up and pre-concentration followed by chromatographic separation and quantification. Although chromatographic techniques provide reliable results in the analysis of OH-PAH, their experimental procedures are time consuming and expensive. Additional problems arise when laboratory procedures are scaled up to handle thousands of samples under mass screening conditions. Under the prospective of a sustainable environment, the large usage of organic solvents is one of the main limitations of current chromatographic methodology. It is within this context that new analytical approaches based on easy-to-use and cost-effective methodology become extremely relevant. This dissertation focuses on the development of screening methodology for the routine analysis of PAH metabolites in numerous samples. It explores the room-temperature fluorescence properties of six metabolites originating from parent PAH included in the Environmental Protection Agency priority pollutants list. 1- hydroxyfluorene, 1-hydroxypyrene, 6-hydroxychrysene, 9-hydroxyphenanthrene, 3- hydroxybenzo[a]pyrene and 4-hydroxybenzo[a]pyrene are used as model biomarkers to investigate the analytical potential of new methods based on solid-phase extraction (SPE) and iii room-temperature fluorescence (RTF) spectroscopy. Quantitative determination of metabolites is carried out either in the eluent extract[1, 2] or on the surface of extraction membranes[3, 4] . The direct determination – i.e., no chromatographic separation - of the six metabolites is based on the collection of excitation-emission matrices and synchronous fluorescence spectra

    On-line monitoring of aqueous base metal solutions with transmittance spectrophotometry

    Get PDF
    Transmittance spectrophotometry was used to monitor copper, cobalt and zinc in solution in laboratory experiments. The samples simulated plant conditions encountered on the Skorpion zinc mine in Namibia and were prepared using a simplex centroid mixture design. Principal component, partial least squares and support vector regression models were calibrated from visible and near infrared absorption spectra. All models could accurately estimate the concentrations of all the metals in solution. Although these models were affected by nickel contamination, the Cu models were less sensitive to this contamination than the Co and Zn models. Likewise, elevated temperatures led to degradation of the calibrated models, particularly the Zn models. The effects of these conditions could be visualized by a linear discriminant score plot of the spectral data

    Calibration Methods of Laser-Induced Breakdown Spectroscopy

    Get PDF
    Laser-induced breakdown spectroscopy (LIBS) has gained great attention over the past two decades due to its many advantages, such as needless sample preparation, capability of remote measurement and fast multielement simultaneous analysis. However, because of its inherent uncertainty features of plasma, it is still a big challenge for LIBS community worldwide to realize high sensitivity and accurate quantitative analysis. Currently, many chemometric analytical methods have been applied to LIBS calibration analysis, including univariate regression, multivariate regression, principal component regression (PCR), partial least squares regression (PLSR) and so on. In addition, appropriate sample and spectral pretreatment can effectively improve the analytical performance (i.e., limit of detection (LOD), accuracy and repeatability) of LIBS. In this chapter, we briefly summarize the progress of these calibration methods and their applications on LIBS and provide our recommendations
    • …
    corecore