54 research outputs found

    Causal AI Modelling of Chemical Manufacturing Plants

    Get PDF
    The concept of “Industry 5.0” is driving significant changes in the production of chemical products and energy, promoting a shift towards a decarbonized and circular economy. Digitalization, robotics, communications, and artificial intelligence (AI) play crucial roles in fostering the development of necessary technological innovations and enhancing intelligent process control. The application of machine deep learning (ML) yields robust, field-neutral solutions for regression prediction objectives, but it is limited in its capacity to address innovative questions that involve causation and counterfactual analysis. This paper presents a proposed application of Bayesian networks (BN) for structural causal modeling (SCM) in the context of manufacturing plants. A critical feature of SCM modeling is its capacity to integrate extensive prior structural knowledge derived from fundamental chemical engineering principles with structures inferred from experimental data obtained from manufacturing plants. The acquired SCM facilitates the forecasting of causal relationships, the simulation of intervention strategies, and the generation of counterfactual responses essential for process innovations and intelligent process management. The SCM model is presented as a tool for examining causality and control in the intricate Tennessee-Eastman process

    Kauzalni modeli obrade podataka o kakvoći hrane pomoću umjetne inteligencije

    Get PDF
    Research background. The aim of this study is to emphasize the importance of artificial intelligence (AI) and causality modelling of food quality and analysis with ’big data’. AI with structural causal modelling (SCM), based on Bayesian networks and deep learning, enables the integration of theoretical field knowledge in food technology with process production, physicochemical analytics and consumer organoleptic assessments. Food products have complex nature and data are highly dimensional, with intricate interrelations (correlations) that are difficult to relate to consumer sensory perception of food quality. Standard regression modelling techniques such as multiple ordinary least squares (OLS) and partial least squares (PLS) are effectively applied for the prediction by linear interpolations of observed data under cross-sectional stationary conditions. Upgrading linear regression models by machine learning (ML) accounts for nonlinear relations and reveals functional patterns, but is prone to confounding and failed predictions under unobserved nonstationary conditions. Confounding of data variables is the main obstacle to applications of the regression models in food innovations under previously untrained conditions. Hence, this manuscript focuses on applying causal graphical models with Bayesian networks to infer causal relationships and intervention effects between process variables and consumer sensory assessment of food quality. Experimental approach. This study is based on the data available in the literature on the process of wheat bread baking quality, consumer sensory quality assessments of fermented milk products, and professional wine tasting data. The data for wheat baking quality were regularized by the least absolute shrinkage and selection operator (LASSO elastic net). Bayesian statistics was applied for the evaluation of the model joint probability function for inferring the network structure and parameters. The obtained SCMs are presented as directed acyclic graphs (DAG). D-separation criteria were applied to block confounding effects in estimating direct and total causal effects of process variables and consumer perception on food quality. Probability distributions of causal effects of the intervention of individual process variables on quality are presented as partial dependency plots determined by Bayesian neural networks. In the case of wine quality causality, the total causal effects determined by SCMs are positively validated by the double machine learning (DML) algorithm. Results and conclusions. The data set of 45 continuous variables corresponding to different chemical, physical and biochemical variables of wheat properties from seven Croatian cultivars during two years of controlled cultivation were analysed. LASSO regularization of the data set yielded the ten key predictors, accounting for 98 % variance of the baking quality data. Based on the key variables, the quality predictive random forest model with 75 % cross-validation accuracy was derived. Causal analysis between the quality and key predictors was based on the Bayesian model shown as a DAG graph. Protein content shows the most important direct causal effect with the corresponding path coefficient of 0.71, and THMM (total high-molecular-mass glutenin subunits) content was an indirect cause with a path coefficient of 0.42, and protein total average causal effect (ACE) was 0.65. The large data set of the quality of fermented milk products included binary consumer sensory data (taste, odour, turbidity), continuous physical variables (temperature, fat, pH, colour) and three grade classes of products by consumer quality assessment. A random forest model was derived for the prediction of the quality classification with an out-of-bag (OOB) error of 0.28 %. The Bayesian network model predicts that the direct causes of the taste classification are temperature, colour and fat content, while the direct causes of the quality classification are temperature, turbidity, odour and fat content. The key quality grade ACE of temperature -0.04 grade/°C and 0.3 quality grade/fat content were estimated. The temperature ACE dependency shows a nonlinear type as negative saturation with the ’breaking’ point at 60 °C, while for fat ACE had a positive linear trend. Causal quality analysis of red and white wine was based on the large data set of eleven continuous variables of physical and chemical properties and quality assessments classified in ten classes, from 1 to 10. Each classification was obtained in triplicate by a panel of professional wine tasters. A non-structural double machine learning (DML) algorithm was applied for total ACE quality assessment. The alcohol content of red and white wine had the key positive ACE relative factor of 0.35 quality/alcohol, while volatile acidity had the key negative ACE of –0.2 quality/acidity. The obtained ACE predictions by the unstructured DML algorithm are in close agreement with the ACE obtained by the structural SCM. Novelty and scientific contribution. Novel methodologies and results for the application of causal artificial intelligence models in the analysis of consumer assessment of the quality of food products are presented. The application of Bayesian network structural causal models (SCM) enables the d-separation of pronounced effects of confounding between parameters in noncausal regression models. Based on the SCM, inference of ACE provides substantiated and validated research hypotheses for new products and support for decisions of potential interventions for improvement in product design, new process introduction, process control, management and marketing.Pozadina istraživanja. Svrha je ovog istraživanja bila naglasiti važnost korištenja umjetne inteligencije (AI) i modeliranja uzročnosti kakvoće hrane, te analize velike količine podataka. Umjetna inteligencija sa strukturnim uzročnim modeliranjem (SCM), temeljena na Bayesovim mrežama i dubokom učenju, omogućuje integraciju teorijskog znanja iz područja prehrambene tehnologije s podacima o proizvodnom procesu, fizikalno-kemijskim svojstvima te organoleptičkim ocjenama proizvoda. Prehrambeni proizvodi su složene naravi, a podaci vrlo dimenzionalni, sa zamršenim međuodnosima (korelacijama), koje je teško povezati sa osjetilnom percepcijom potrošača o kakvoći hrane. Standardne tehnike regresijskog modeliranja, kao što su višestruki obični najmanji kvadrati (OLS) i parcijalni najmanji kvadrati (PLS), učinkovito se primijenjuju za predviđanje učinaka pomoću linearnih interpolacija opaženih podataka pod stacionarnim uvjetima presjeka. Nadogradnja modela linearne regresije pomoću strojnog učenja (ML) uzima u obzir nelinearne odnose i otkriva funkcionalne obrasce, ali je sklona zbunjujućim i neuspjelim predviđanjima u neopaženim nestacionarnim uvjetima. Interferencija varijabli podataka glavna je prepreka primjeni regresijskih modela u prehrambenim inovacijama pod prethodno neuvježbanim uvjetima. Stoga se ovaj rad usredotočuje na primjenu kauzalnih grafičkih modela s Bayesovim mrežama za zaključivanje uzročno-posljedičnih odnosa i učinaka intervencije između procesnih varijabli i senzorske procjene kakvoće hrane. Eksperimentalni pristup. Ovo se istraživanje temelji na podacima dostupnim u literaturi, o procesu pečenja kruha od pšeničnog brašna, potrošačkim ocjenama senzorske kakvoće fermentiranih mliječnih proizvoda, te rezultatima stručnog kušanja vina. Podaci o kakvoći pečenja pšenice regulirani su operatorom najmanjeg apsolutnog skupljanja i odabira (LASSO elastična mreža). Bayesova statistika primijenjena je za procjenu zajedničke funkcije vjerojatnosti modela za zaključivanje mrežne strukture i parametara. Dobiveni strukturni kauzalni modeli prikazani su kao usmjereni aciklički grafovi (DAG). Kriteriji D-odvajanja primijenjeni su za blokiranje interferirajućih učinaka pri procjeni izravnih i ukupnih uzročnih učinaka procesnih varijabli i percepcije potrošača o kakvoći hrane. Distribucije vjerojatnosti uzročnih učinaka intervencije pojedinih procesnih varijabli na kakvoću prikazane su kao dijagrami djelomične ovisnosti, određeni Bayesovim neuronskim mrežama. U slučaju uzročnosti kakvoće vina, ukupni uzročni učinci utvrđeni pomoću SCM potvrđeni su algoritmom dvostrukog strojnog učenja (DML). Rezultati i zaključci. Ispitan je skup podataka od 45 kontinuiranih varijabli koje odgovaraju različitim varijablama kemijskih, fizikalnih i biokemijskih svojstava sedam hrvatskih kultivara pšenice prikupljenim tijekom dvije godine kontroliranog uzgoja. LASSO regulacija skupa podataka dala je deset ključnih prediktora, koji obuhvaćaju 98 % varijance podataka o kakvoći pečenja. Na osnovi ključnih varijabli izveden je prediktivni model slučajne šume sa 75 % točnosti unakrsne provjere. Uzročna analiza između kakvoće i ključnih prediktora temeljila se na Bayesovom modelu prikazanom kao DAG. Udjel proteina imao je najveći izravni uzročni učinak s koeficijentom puta od 0,71; udjel ukupnih podjedinica glutenina velike molekularne mase bio je neizravni uzrok s koeficijentom puta od 0,42; dok je prosječni uzročni učinak (ACE) ukupnog udjela proteina bio 0,65. Veliki skup podataka o kakvoći fermentiranih mliječnih proizvoda uključivao je binarne senzorske podatke (okus, miris, zamućenost), kontinuirane fizikalne varijable (temperatura, masnoća, pH, boja) i tri stupnja potrošačke ocjene proizvoda. Model slučajnih šuma izveden je radi predviđanja klasifikacije kakvoće s „out-of-bag“ (OOB) pogreškom od 0,28 %. Bayesov mrežni model predviđa da na klasifikaciju okusa izravno utječu temperatura, boja i udjel masti, dok na klasifikaciju kakvoće izravno utječu temperatura, zamućenost, miris i udjel masti. Procijenjeni su ključni ACE od −0,04 stupnja kakvoće/°C i 0,3 stupnja kakvoće/udjelu masti. Ovisnost ACE o temperaturi pokazuje nelinearni tip kao negativno zasićenje s točkom „prijeloma” na 60 °C, dok je ACE udjela masti imao pozitivan linearni trend. Uzročna analiza kakvoće crnog i bijelog vina temeljila se na velikom skupu podataka od jedanaest kontinuiranih varijabli fizikalnih i kemijskih svojstava i procjena kakvoće razvrstanih u deset klasa, od 1 do 10. Svaku je klasifikaciju u tri ponavljanja proveo panel profesionalnih kušača vina. Za procjenu ACE ukupne kakvoće primijenjen je algoritam nestrukturalnog dvostrukog strojnog učenja (DML). Udjel alkohola u crnom i bijelom vinu imao je ključni pozitivni ACE relativni faktor od 0,35 kakvoće/udjelu alkohola, dok je hlapljiva kiselost imala ključni negativni ACE od –0,2 kakvoće/kiselosti. Dobivena predviđanja ACE nestrukturiranim algoritmom DML uvelike odgovaraju onima dobivenim strukturnim SCM-om. Novost i znanstveni doprinos. Prikazane su nove metodologije i rezultati primjene kauzalnih modela umjetne inteligencije u analizi potrošačke procjene kakvoće prehrambenih proizvoda. Primjena Bayesovih mrežnih strukturno kauzalnih modela (SCM) omogućuje d-odvajanje izraženih učinaka konfuzije između parametara u nekauzalnim regresijskim modelima. Na temelju SCM-a, zaključivanje ACE-a daje potkrijepljene i potvrđene istraživačke hipoteze za nove proizvode i podršku za odluke o mogućim intervencijama u svrhu poboljšanja dizajna proizvoda, uvođenja novih procesa, kontrolu procesa, upravljanja i marketinga

    Wine quality analysis by the structural causal model (SCM)

    Get PDF
    Bayes network modelling for structural causal analysis between wine physicochemical data and quantitative human quality blind assessments is applied. The large dataset of white and red "Vinho Verde\u27\u27 wine samples from Portugal, which was available from an open data repository for machine learning at the University of California at Irving, was analysed. The dataset contains 4898 white and 1599 red samples evaluated by blind tastes by a minimum of 3 sensory assessors and 12 physicochemical properties. The casual effects of wine analytic data on human quality evaluations are evaluated numerically by Bayes neural networks for adjusted sets of the covariates as marginal distributions and presented graphically as partial dependence plots. Structural causal analysis revealed important differences between the most important variables for quality predictions and the individual causal effects. Bayes neural network models of the partial dependencies show more pronounced nonlinear effects for red wines compared to white wine quality. The artificial intelligence models with boosted random decision tree forests for untrained wine samples yield a 5% relative standard error of predictions compared to 12% for the linear models and ordinary least squares estimation. For red wine, the most important direct causal quality effects are caused by alcohol, volatile acidity, and sulphates. Alcohol improves quality with a maximum plateau at 14%, while volatile acidity has a strong proportional negative effect. The effect of sulphates is highly nonlinear with maximum positive effect at a concentration of 1 g/L of (K_{2}SO_{4}). For the white wine samples causal effects are linear with positive effects of alcohol and negative effects of volatile and fixed acidity. The developed structural causal model enables evaluation of targeted wine production interventions, named as “doing x, do(x) models”, as restructured adjusted Bayes networks. It leads to potential applications of artificial intelligence in wine production technology and process quality control

    Optimisation of the Daily Nutrient Composition of Daily Intakes During Gestation

    Get PDF
    An appropriate lifestyle and diet of pregnant woman during prenatal development contribute to the proper development of a foetus. Since the third month of pregnancy, physical activity should follow the metabolic needs. In this paper, linear programming has been applied in meal planning according to the guidelines recommended for women aged 19 to 30, with emphasis on nutrient intake during all nine months of pregnancy. Data used as the nutritional composition are based on the seven-day supply, where each day consisted of 4 meals; breakfast, lunch, dinner and snack. Linear optimization was carried out using the LINDO program. The program included 28 variables and 20 constraints; energy, water, proteins, fats, carbohydrates, cholesterol, dietary fi bres, vitamins soluble in fats; A, D, water-soluble vitamins, B1, B2, niacin, B6, folic acid, B12, C, and minerals; calcium, iron, magnesium, and sodium. The results show that well-balanced, diverse and regular diet can be offered for pregnant woman based on prescribed guidelines providing adequate amounts of nutrients without taking additional supplements. The sensitivity analysis indicated that the menu planning has some limitations regarding the chosen foods in a weekly menu. Especially in the 3rd trimester it is important to include foods rich with folic acid, magnesium and iron

    Mathematical Modelling of Gene Regulatory Networks

    Get PDF

    Kauzalni AI model i optimiranje održivosti sastava betonskih mješavina

    Get PDF
    U radu je provedena kauzalna analiza učinka ekološki održivih mješavina s cementom na tlačnu čvrstoću betona i smanjenje emisije CO2. Primijenjen je Bayesov model kauzalnosti, skupova stabala odlučivanja i dubokih neuronskih mreža. Model se zasniva na velikom skupu podataka, broj uzoraka n = 1030 i p = 9 varijabli sastava (cement, šljaka, lebdeći pepeo, voda, plastifikator, krupni agregat, sitni agregat, vrijeme i tlačna čvrstoća betona). Model je usmjereni aciklički graf (DAG) određen heurističkim postupkom optimiranja Bayesova informacijskog kriterija (BIC). AI model dobiven strojnim učenjem omogućuje predikciju tlačne čvrstoće betona s prosječnom apsolutnom pogreškom 3 MPa (4,3 %) u odnosu na pogrešku od 10 MPa za višestruki linearni model. Za eliminaciju interferirajućih efekata među varijablama primijenjen je kriterij usmjerenog razdvajanja (d-separacija) za određivanje kauzalnih učinaka pojedinih varijabli na tlačnu čvrstoću betona. Pojedini učinci izraženi su kao srednji efekti učinaka (engl. Average Treatment Effect, ATE). Rezultat kauzalnog učinka vremena pokazuju dvofaznu dinamiku kinetike nultog stupnja. Najveće vrijednosti ATE (MPa/kg m–3) tijekom prve faze procesa pokazuju: krupni agregat 0,53, plastifikator 0,35 i sitni agregat 0,19, dok najveći negativni učinak ima voda −0,3. U drugoj fazi procesa najveći pozitivni ATE od 0,5 pokazuje plastifikator, a najveći negativni je za krupni agregat od −0,23. Zbog kompleksne interakcije varijabli i složene dinamike procesa predložen je genetički algoritam optimiranja sastava smjese. AI model predviđa znatno potencijalno smanjenje emisije CO2 upotrebom lebdećeg pepela i upotrebom zgure

    The Causal Ecological Model Based on EU Project Data “LTER Northern Adriatic Sea”

    Get PDF
    Cilj ovog rada je pokazati mogućnosti primijene metodologije umjetne inteligencije i strukturnog kauzalnog modeliranja (engl. Structural Causal Model, SCM) s ciljem postizanja znanstvenog doprinosa utvrđivanjem kauzalne funkcionalne zakonitosti bioloških značajki o abiotičkim parametrima. Temeljna zadaća rada je istražiti model SCM za određivanje zavisnosti koncentracije klorofila o fizikalnim značajkama u području sjevernog Jadrana tijekom razdoblja od 1965. do 2015. godine. Eksperimentalni podatci rezultat su dugotrajnog i ekstenzivnog istraživanja u okviru EU projekta “LTER Northern Adriatic Sea” i dostupni su (putem EU znanstvene politike “Open Science”) u velikoj bazi podataka (engl. Big Data), koja sadrži 10 8687 uzoraka s 43 značajke. Predložen je matematički model Bayesove mreže (engl. Bayes Network, BN) kao usmjereni neciklički graf (engl. Directed Acyclic Graph, DAG). Struktura grafa određena je primjenom testa uvjetne nezavisnosti (Hamilton-Schmidtova Conditional Indepedence test, HSCI) s razinom signifikantnosti α = 0,05. SCM model pokazuje da su neposredni kauzalni utjecaji na koncentraciju klorofila: temperatura, salinitet, pH, dušik, fosfor i silicij. Primijenjena je metodologija d-razdvajanja BN grafa sa svrhom blokiranja interferencije (engl. confounding) za procjenu kauzalne funkcionalne zavisnosti bioloških značajki o abiotičkim parametrima. Funkcije kauzalnosti određene su kao rubne razdiobe (engl. marginal distributions) modeliranjem Bayesovom neuronskom mrežom (engl. Bayes Neural Network, BNN). Najveći neposredni negativni kauzalni učinak na klorofil A (Chlorophyll A) ima temperatura (−0,07 μg klorofila A/°C). Utvrđena je pozitivna kauzalna zavisnost između klorofila-A i otopljenog kisika (0,2 mg otopljenog kisika DO2/μg klorofila A). Također je provedena neparametarska usporedna analiza klorofila A i fizikalnih parametara hrvatskog dijela i podataka za cjelokupni sjeverni Jadran. Medijan koncentracije otopljenog kisika u hrvatskom dijelu Jadrana je 5,8 mg O2/l a u sjevernom je 5,5 mg O2/l, dok je medijan temperature u hrvatskom dijelu T = 14,6 °C u odnosu na T = 15,1 °C za sjeverni Jadran. Medijan broja stanica bičaša (Dinoflagellate) je u hrvatskom dijelu Jadrana 3 stanice/l, u odnosu na cijeli sjeverni Jadran, gdje je on od 5 stanica/l. Značajna je razlika u učestalosti i iznosu visokog broja bičaša. Medijani koncentracija klorofila A ne pokazuju značajnu razliku (0,65 i 0,90 μg l–1), ali u sjevernom Jadranu je znatno veći broj uzoraka koji po iznosu signifikantno odstupaju od normalne razdiobe (engl. outliers or hotspots). Utvrđena je i značajna razlika u razdiobi koncentracije silicija s velikim brojem uzoraka s visokim koncentracijama u zapadnom dijelu Jadrana. Primijenjeni su modeli “šume” stabala odlučivanja (engl. random forest) predikcije bioloških značajki na osnovi abiotičkih veličina. Validacije modela provedene su određivanjem relativne postotne pogreške predikcije primjenom simulacije “novih” podataka peterostrukom podjelom baze podataka. Postignute su sljedeće pogreške modela predikcije: za klorofil (engl. chlorophyll) 6,5 %; feopigment (Pheeopigment) 17,4 %; diatomeje (Diatom) 18,8 %; dinoflagelat (Dinoflagellate) 17,4 %; i kokolitifore (Coccolithoophores) 12,1 %. Za svaki od modela utvrđeni su ključni abiotički faktori za procjenu predikcija.The aim of this work was to show possibilities of applied artificial intelligence methodologies and structural causal modelling (“Structural Causal Model”, SCM) with the object of gaining a scientific level contribution to the determination of functional causal dependencies in complex ecological systems. In this work, applied was SCM for the determination of dependencies of chlorophyll concentration on physical and chemical parameters in the northern Adriatic Sea during the period 1965 to 2015. The experimental data are the outcome of the long-term and extensive investigation as a part of the EU project “LTER Northern Adriatic Sea”, and are freely available within the EU Open Science policy. The data are a “Big Data” base with 108 687 samples and 43 descriptors. Proposed is a mathematical model with Bayes network (BN) as a directed acyclic graph (DAG). The model structure was determined by the Hamilton-Schmidt conditional independence test with a significance level of α = 0.05. The SCM model shows that the direct causal variables for chlorophyll concentration are: temperature, salinity, pH, and concentrations of nitrogen, phosphor, and silica. The BN model was adjusted according to d-separation with the objective to block confounding and contra-causal back door interference. The functions of causal dependencies were determined as the marginal distributions with Bayes network models with a single interior layer for interpolation. The most important causal effect was due to temperature (−0.07 μg chlorophyll A/°C). The model predicted reversed positive causality between chlorophyll concentration and dissolved oxygen (0.2 mg DO2/μg chlorophyll A). Also evaluated was nonparametric comparative analysis of chlorophyll and abiotic parameters between Croatian and northern Adriatic Sea (Slovenia and Italy). The comparison was based on median metrics to avoid the pronounced influence of outliers due to hydrodynamic effects. The median concentration of dissolved oxygen in Croatian Adriatic was 5.8 mg O2/l, while in Slovenian and Italian 5.5 mg O2/l, and the median temperature was T = 14.6 °C compared to T = 15.1 °C. There is a significant difference in the abundance of dinoflagellates in Croatia 3 cell/l, while in Slovenia and Italian 5 cells/l. The difference is more pronounced by the number and values of “hot spots” outliers. The difference between chlorophyll concentrations is not significant (0.65 and 0.90 μg l–1); however, the difference in the distribution of the outliers is significant with more frequent and bigger outliers in Italian and Slovenian Adriatic. Also observed was a significant difference in SiO4 distribution, with higher concentrations in the western Adriatic. The random forest RF decision tree models are applied for the development of the predictive models of biological parameters based on abiotic data. The RF models are validated by 5-fold cross-validation. The models have out-of-box mean relative errors of 6.5 % for chlorophyll, photopigment 17.4 %; diatoms 18.8 %; dinoflagellate 17.4 %; and 12.1 % for coccolithophores. For each predictive model determined are the first five most important predictors accounting for 95 % of importance

    Inovacije procesa za kružno gospodarstvo primjenom umjetne inteligencije (AI)

    Get PDF
    Razvoj i primjena umjetne inteligencije (AI) je sustavski prisutna u svim oblicima društvenog života, tehnologijama i znanosti. Usprkos zabrinutosti, očekuje se da će primjena AI imati najvažniji učinak u rješavanju globalnih problema u razvoju kružnog gospodarstva. Motivacija ovog rada je pokazati potencijalnu mogućnost AI modela zasnovanih na spoznaji kauzalnih veza na primjerima novih tehnologija ekstrakcije biološki aktivnih molekula, zaštiti okoliša i razvoju novih materijala. Prikazani su rezultati modela umjetne inteligencije primjenom načela strukturnog kauzalnog modela (SCM) Bayes-ove mreže (BN) fuzijom temeljnih znanja i podatkovnih baza. Kauzalna analiza posljedica mogućih intervencije u tehnološkom procesu provodi se transformacijom BN mreže primjenom d-separacije BN mreža. Prikazani su rezultati kauzalnih AI modela za unaprijeđenje “zelene” tehnologije ekstrakcije biomolekula, biorafinerije, razgradnje tekstilnih otpadnih voda i oporabe građevinskog otpada
    corecore