555 research outputs found

    Development of predictive models for catalyst development

    Get PDF
    Abstract. This work was done as a part of the BioSPRINT project, which aims to improve biorefinery operations through process intensification and to replace fossil-based polymers with new bio-based products. The goal was to identify machine learned (ML) models that will accelerate the catalyst identification with high-throughput (HTP) screening methods, identify non-obvious formulations and allow catalyst tuning for different feedstock compositions. Maximum activity for conversion of complex sugar mixtures with optimal selectivity towards the key products of interest is desired. In the literature part of the thesis, ML was studied in general, where the focus was on different variable selection methods and modeling techniques, more specifically on data-driven modeling. Furthermore, modeling in catalysis was discussed with focus on ML in catalysis. Catalyst screening and selection, descriptor modeling and selection, and predictive modeling in catalysis were studied. In the experimental part, focus was on developing ML models that predict catalyst performance with relevant descriptors. Dataset for hydrogenation of 5-ethoxymethylfurfural with simple bimetal catalysts, including main metals and promoters, was used as ML model input with the addition of catalyst descriptors found in the literature. Four different responses were used in the experiments: selectivity and conversion with two different solvents. Methods used in the experimental part were discussed in detail, where data collection, preprocessing, variable selection, modeling and model validation were considered. Reference models without variable selection were first identified. Secondly, regularization algorithms were used to identify models. Finally, models with variable subsets obtained with regularization algorithms were identified. The effect of cross-validation was also studied. In general, good modeling results were obtained with boosted ensemble tree methods, support vector machine (SVM) methods and Gaussian process regression (GPR) methods. Lasso regression turned out to be the best variable selection method. Good results were obtained with the descriptors found in the literature. It was also shown, that fairly good results can be obtained with only two variables in the studied case. Promoter variables were not considered nearly as important as main metals with variable selection algorithms. Even though the modeling results were good, the variable selection methods were almost purely data-driven, and the actual relevance of the variables cannot be guaranteed. In the future work, optimization should be studied with the goal of finding catalysts that maximize catalyst performance values based on the model predictions. Also, extrapolation capabilities of the models need to be studied and improved. The studied methods can be easily implemented to other datasets. In the BioSPRINT project, experimental results related to the dehydration reaction of C5 and C6 sugars with simple metal catalysts will be obtained and used with the studied methods.Ennustavien mallien laatiminen katalyytin valmistuksen tehostamiseksi. Tiivistelmä. Tämä työ tehtiin osana BioSPRINT-projektia, jonka tavoitteena on kehittää biojalostamoiden toimintaa parantamalla niiden prosessitehokkuutta ja korvata fossiilipohjaiset polymeerit uusilla biopohjaisilla tuotteilla. Työn tavoitteena oli muodostaa koneoppimista hyödyntämällä mallit, jotka nopeuttavat optimaalisten katalyyttien löytämistä tehoseulonnan (high-throughput (HTP) screening) avulla, auttavat identifioimaan vaikeasti löydettäviä katalyyttiyhdistelmiä ja mahdollistavat katalyytin valinnan eri lähtöainekoostumuksilla. Tavoitteena on maksimoida monimutkaisten sokeriyhdisteiden konversio ja selektiivisyys halutuiksi tuotteiksi. Työn kirjallisuusosiossa perehdyttiin koneoppimiseen yleisellä tasolla, missä pääpaino oli muuttujanvalintamenetelmissä ja datapohjaisissa mallinnusmenetelmissä. Lisäksi kirjallisuusosassa tutkittiin mallinnuksen käyttöä katalyysissä, missä pääpaino oli koneoppimisen käytössä. Työssä tarkasteltiin myös katalyyttien seulontaa ja valintaa, laskennallisten muuttujien (deskriptorien) määrittelyä ja valintaa, sekä ennustavan mallinnuksen käyttöä katalyysissä. Kokeellisessa osiossa painopiste oli koneoppimista hyödyntävien mallien muodostuksessa, jotka ennustavat katalyyttien suorituskykyä oleellisilla deskriptoreilla. Data-aineistona käytettiin 5-etoksimetyylifurfuraalin hydrausreaktion tuloksia yksinkertaisilla kaksikomponenttisilla metallikatalyyteillä, jotka sisältävät päämetallin ja promoottorin. Data-aineistoa täydennettiin kirjallisuudesta löytyvillä katalyyttien deskriptoreilla ja käytettiin koneoppimista hyödyntävien mallien sisääntulona. Tutkimuksissa käytettiin neljää eri vastemuuttujaa: selektiivisyyttä ja konversiota kahdella eri liuottimella. Kokeellisessa osiossa käytetyt menetelmät käytiin läpi perusteellisesti huomioon ottaen data-aineiston keräämisen, esikäsittelyn, muuttujanvalinnan, mallinnuksen ja mallin validoinnin. Ensin referenssimallit identifioitiin. Tämän jälkeen regularisaatioalgoritmeilla suoritettiin mallinnus. Lopuksi mallinnus suoritettiin käyttämällä muuttujajoukkoja, jotka oli valittu käyttäen regularisaatioalgoritmeja. Myös ristivalidoinnin vaikutusta tutkittiin. Yleisesti hyvät mallinnustulokset saavutettiin boosted ensemble tree -tekniikalla, tukivektorikoneella ja Gaussian process -regressiolla. Lasso-menetelmä todettiin parhaaksi muuttujanvalinta-algoritmiksi. Hyvät tulokset saavutettiin kirjallisuudesta löytyvien deskriptorien avulla. Tutkimuksissa todettiin myös, että hyvät mallinnustulokset voidaan saavuttaa kyseisessä tutkimustapauksessa jopa vain kahdella muuttujalla. Päämetalleja kuvaavien muuttujien merkitsevyys todettiin paljon suuremmaksi kuin promoottorien vastaavien muuttujien. Saatavia mallinnustuloksia tarkasteltaessa täytyy huomioida, että muuttujanvalinta oli melkein täysin datapohjainen eikä muuttujien varsinaista merkitsevyyttä voida taata. Jatkossa mallien ennustuksia voidaan hyödyntää optimoinnissa, jossa tavoitteena on etsiä katalyyttiyhdistelmä, joka maksimoi katalyyttien suorituskyvyn. Myös mallin ekstrapolointikykyä täytyy tutkia ja kehittää. Tutkittavat menetelmät ovat helposti sovellettavissa myös muille samantyylisille data-aineistoille. BioSPRINT-projektista saadaan tulevaisuudessa käytettäväksi viisi- ja kuusihiilisten sokerien dehydraatioon perustuva data-aineisto yksinkertaisilla metallikatalyyteillä, jota tullaan käyttämään jatkotutkimuksissa

    Interpretable machine learning to model biomass and waste gasification

    Get PDF
    Machine learning has been regarded as a promising method to better model thermochemical processes such as gasification. However, their black box nature can limit how much one can trust and learn from the developed models. Here seven different machine learning methods have been adopted to model the gasification of biomass and waste across a wide range of operating conditions. Gradient boosting regression has been found to outperform the other model types with a coefficient of determination (R2) of 0.90 when averaged across ten key gasification outputs. Global and local model interpretability methods have been used to illuminate the developed black box models. The studied models were most strongly influenced by the feedstock’s particle size and the type of gasifying agent employed. By combining global and local interpretability methods, the understanding of black box models has been improved. This allows policy makers and investors to make more educated decisions about gasification process design

    Artificial Intelligence in Process Engineering

    Get PDF
    In recent years, the field of Artificial Intelligence (AI) is experiencing a boom, caused by recent breakthroughs in computing power, AI techniques, and software architectures. Among the many fields being impacted by this paradigm shift, process engineering has experienced the benefits caused by AI. However, the published methods and applications in process engineering are diverse, and there is still much unexploited potential. Herein, the goal of providing a systematic overview of the current state of AI and its applications in process engineering is discussed. Current applications are described and classified according to a broader systematic. Current techniques, types of AI as well as pre- and postprocessing will be examined similarly and assigned to the previously discussed applications. Given the importance of mechanistic models in process engineering as opposed to the pure black box nature of most of AI, reverse engineering strategies as well as hybrid modeling will be highlighted. Furthermore, a holistic strategy will be formulated for the application of the current state of AI in process engineering

    Scanning electron microscopy image representativeness: morphological data on nanoparticles.

    Get PDF
    A sample of a nanomaterial contains a distribution of nanoparticles of various shapes and/or sizes. A scanning electron microscopy image of such a sample often captures only a fragment of the morphological variety present in the sample. In order to quantitatively analyse the sample using scanning electron microscope digital images, and, in particular, to derive numerical representations of the sample morphology, image content has to be assessed. In this work, we present a framework for extracting morphological information contained in scanning electron microscopy images using computer vision algorithms, and for converting them into numerical particle descriptors. We explore the concept of image representativeness and provide a set of protocols for selecting optimal scanning electron microscopy images as well as determining the smallest representative image set for each of the morphological features. We demonstrate the practical aspects of our methodology by investigating tricalcium phosphate, Ca3 (PO4 )2 , and calcium hydroxyphosphate, Ca5 (PO4 )3 (OH), both naturally occurring minerals with a wide range of biomedical applications

    Machine and deep learning meet genome-scale metabolic modeling

    Get PDF
    Omic data analysis is steadily growing as a driver of basic and applied molecular biology research. Core to the interpretation of complex and heterogeneous biological phenotypes are computational approaches in the fields of statistics and machine learning. In parallel, constraint-based metabolic modeling has established itself as the main tool to investigate large-scale relationships between genotype, phenotype, and environment. The development and application of these methodological frameworks have occurred independently for the most part, whereas the potential of their integration for biological, biomedical, and biotechnological research is less known. Here, we describe how machine learning and constraint-based modeling can be combined, reviewing recent works at the intersection of both domains and discussing the mathematical and practical aspects involved. We overlap systematic classifications from both frameworks, making them accessible to nonexperts. Finally, we delineate potential future scenarios, propose new joint theoretical frameworks, and suggest concrete points of investigation for this joint subfield. A multiview approach merging experimental and knowledge-driven omic data through machine learning methods can incorporate key mechanistic information in an otherwise biologically-agnostic learning process

    Multiagent System for Image Mining

    Get PDF
    The overdone growth, wide availability, and demands for remote sensing databases combined with human limits to analyze such huge datasets lead to a need to investigate tools, techniques, methodologies, and theories capable of assisting humans at extracting knowledge. Image mining arises as a solution to extract implicit knowledge intelligently and semiautomatically or other patterns not explicitly stored in the huge image databases. However, spatial databases are among the ones with the fastest growth due to the volume of spatial information produced many times a day, demanding the investigation of other means for knowledge extraction. Multiagent systems are composed of multiple computing elements known as agents that interact to pursuit their goals. Agents have been used to explore information in the distributed, open, large, and heterogeneous platforms. Agent mining is a potential technology that studies ways of interaction and integration between data mining and agents. This area brought advances to the technologies involved such as theories, methodologies, and solutions to solve relevant issues more precisely, accurately and faster. AgentGeo is evidence of this, a multiagent system of satellite image mining that, promotes advances in the state of the art of agent mining, since it relevant functions to extract knowledge from spatial databases

    Biomass Gasification and Applied Intelligent Retrieval in Modeling

    Get PDF
    Gasification technology often requires the use of modeling approaches to incorporate several intermediate reactions in a complex nature. These traditional models are occasionally impractical and often challenging to bring reliable relations between performing parameters. Hence, this study outlined the solutions to overcome the challenges in modeling approaches. The use of machine learning (ML) methods is essential and a promising integration to add intelligent retrieval to traditional modeling approaches of gasification technology. Regarding this, this study charted applied ML-based artificial intelligence in the field of gasification research. This study includes a summary of applied ML algorithms, including neural network, support vector, decision tree, random forest, and gradient boosting, and their performance evaluations for gasification technologies

    Machine Learning Approach to Simulate Soil CO\u3csub\u3e2\u3c/sub\u3e Fluxes under Cropping Systems

    Get PDF
    With the growing number of datasets to describe greenhouse gas (GHG) emissions, there is an opportunity to develop novel predictive models that require neither the expense nor time required to make direct field measurements. This study evaluates the potential for machine learning (ML) approaches to predict soil GHG emissions without the biogeochemical expertise that is required to use many current models for simulating soil GHGs. There are ample data from field measurements now publicly available to test new modeling approaches. The objective of this paper was to develop and evaluate machine learning (ML) models using field data (soil temperature, soil moisture, soil classification, crop type, fertilization type, and air temperature) available in the Greenhouse gas Reduction through Agricultural Carbon Enhancement network (GRACEnet) database to simulate soil CO2 fluxes with different fertilization methods. Four machine learning algorithms—K nearest neighbor regression (KNN), support vector regression (SVR), random forest (RF) regression, and gradient boosted (GB) regression—were used to develop the models. The GB regression model outperformed all the other models on the training dataset with R2 = 0.88, MAE = 2177.89 g C ha−1 day−1, and RMSE 4405.43 g C ha−1 day−1. However, the RF and GB regression models both performed optimally on the unseen test dataset with R2 = 0.82. Machine learning tools were useful for developing predictors based on soil classification, soil temperature and air temperature when a large database like GRACEnet is available, but these were not highly predictive variables in correlation analysis. This study demonstrates the suitability of using tree-based ML algorithms for predictive modeling of CO2 fluxes, but no biogeochemical processes can be described with such models
    corecore