59 research outputs found

    A probabilistic evaluation procedure for process model matching techniques

    Full text link
    Process model matching refers to the automatic identification of corresponding activities between two process models. It represents the basis for many advanced process model analysis techniques such as the identification of similar process parts or process model search. A central problem is how to evaluate the performance of process model matching techniques. Current evaluation methods require a binary gold standard that clearly defines which correspondences are correct. The problem is that often not even humans can agree on a set of correct correspondences. Hence, evaluating the performance of matching techniques based on a binary gold standard does not take the true complexity of the matching problem into account and does not fairly assess the capabilities of a matching technique. In this paper, we propose a novel evaluation procedure for process model matching techniques. In particular, we build on the assessments of multiple annotators to define the notion of a non-binary gold standard. In this way, we avoid the problem of agreeing on a single set of correct correspondences. Based on this non-binary gold standard, we introduce probabilistic versions of precision, recall, and F-measure as well as a distance-based performance measure. We use a dataset from the Process Model Matching Contest 2015 and a total of 16 matching systems to assess and compare the insights that can be obtained by using our evaluation procedure. We find that our probabilistic evaluation procedure allows us to gain more detailed insights into the performance of matching systems than a traditional evaluation based on a binary gold standard

    Model-driven engineering techniques and tools for machine learning-enabled IoT applications: A scoping review

    Get PDF
    This paper reviews the literature on model-driven engineering (MDE) tools and languages for the internet of things (IoT). Due to the abundance of big data in the IoT, data analytics and machine learning (DAML) techniques play a key role in providing smart IoT applications. In particular, since a significant portion of the IoT data is sequential time series data, such as sensor data, time series analysis techniques are required. Therefore, IoT modeling languages and tools are expected to support DAML methods, including time series analysis techniques, out of the box. In this paper, we study and classify prior work in the literature through the mentioned lens and following the scoping review approach. Hence, the key underlying research questions are what MDE approaches, tools, and languages have been proposed and which ones have supported DAML techniques at the modeling level and in the scope of smart IoT services.info:eu-repo/semantics/publishedVersio

    CONDA-PM -- A Systematic Review and Framework for Concept Drift Analysis in Process Mining

    Get PDF
    Business processes evolve over time to adapt to changing business environments. This requires continuous monitoring of business processes to gain insights into whether they conform to the intended design or deviate from it. The situation when a business process changes while being analysed is denoted as Concept Drift. Its analysis is concerned with studying how a business process changes, in terms of detecting and localising changes and studying the effects of the latter. Concept drift analysis is crucial to enable early detection and management of changes, that is, whether to promote a change to become part of an improved process, or to reject the change and make decisions to mitigate its effects. Despite its importance, there exists no comprehensive framework for analysing concept drift types, affected process perspectives, and granularity levels of a business process. This article proposes the CONcept Drift Analysis in Process Mining (CONDA-PM) framework describing phases and requirements of a concept drift analysis approach. CONDA-PM was derived from a Systematic Literature Review (SLR) of current approaches analysing concept drift. We apply the CONDA-PM framework on current approaches to concept drift analysis and evaluate their maturity. Applying CONDA-PM framework highlights areas where research is needed to complement existing efforts.Comment: 45 pages, 11 tables, 13 figure

    Periodic assessment of (ET-1) and Nitric Oxide (NO) in hypertensive disorders of pregnancy (HDP)

    Get PDF
    INTRODUCTION Hypertensive Disorders of Pregnancy (HDP) is an independent risk factor of cardiovascular (CVS) disease. Endothelin-1 (ET-1), a potent vasoconstrictor, has been identified as a pivotal mediator in both essential hypertension and HDP. Disturbances in Nitric Oxide (NO) bioavailability found in endothelial dysfunction may increase susceptibility to cardiovascular diseases. METHODOLOGY Thirty six pregnant women at 30-36 weeks period of gestation from the following categories (i) pregnancy induced hypertension (PIH) (ii) chronic hypertension during pregnancy (CH) and (iii) normal pregnant women (Control). Blood pressure indices measurements and sample collection was done at antepartum (30-36 weeks), post partum (8 weeks and 12 weeks). Endothelin-1 and serum NO were measured using the Human ET-1 (Endothelin-1) and NO ELISA Kit. RESULTS All blood pressure indices were significantly higher in HDP patients compared to control during antenatal and post partum periods. Serum ET-1 was significantly higher in patients with HDP compared to control during antenatal until 3 months post partum. This was accompanied by significantly lower levels of serum NO in HDP patients. CONCLUSION Persistently high levels of ET-1 and low levels of NO up to 3 months post partum in patients with history of HDP indicate presence of persistent endothelial dysfunction despite BP normalisation in PIH patients. Long term NO/ET-1 imbalance may account for the increased CVS disease risk

    Semantic Mediation of Environmental Observation Datasets through Sensor Observation Services

    Get PDF
    A large volume of environmental observation data is being generated as a result of the observation of many properties at the Earth surface. In parallel, there exists a clear interest in accessing data from different data providers related to the same property, in order to solve concrete problems. Based on such fact, there is also an increasing interest in publishing the above data through open interfaces in the scope of Spatial Data Infraestructures. There have been important advances in the definition of open standards of the Open Geospatial Consortium (OGC) that enable interoperable access to sensor data. Among the proposed interfaces, the Sensor Observation Service (SOS) is having an important impact. We have realized that currently there is no available solution to provide integrated access to various data sources through a SOS interface. This problem shows up two main facets. On the one hand, the heterogeneity among different data sources has to be solved. On the other hand, semantic conflicts that arise during the integration process must also resolved with the help of relevant domain expert knowledge. To solve the problems, the main goal of this thesis is to design and develop a semantic data mediation framework to access any kind of environmental observation dataset, including both relational data sources and multidimensional arrays

    Metadata-driven data integration

    Get PDF
    Cotutela: Universitat PolitĂšcnica de Catalunya i UniversitĂ© Libre de Bruxelles, IT4BI-DC programme for the joint Ph.D. degree in computer science.Data has an undoubtable impact on society. Storing and processing large amounts of available data is currently one of the key success factors for an organization. Nonetheless, we are recently witnessing a change represented by huge and heterogeneous amounts of data. Indeed, 90% of the data in the world has been generated in the last two years. Thus, in order to carry on these data exploitation tasks, organizations must first perform data integration combining data from multiple sources to yield a unified view over them. Yet, the integration of massive and heterogeneous amounts of data requires revisiting the traditional integration assumptions to cope with the new requirements posed by such data-intensive settings. This PhD thesis aims to provide a novel framework for data integration in the context of data-intensive ecosystems, which entails dealing with vast amounts of heterogeneous data, from multiple sources and in their original format. To this end, we advocate for an integration process consisting of sequential activities governed by a semantic layer, implemented via a shared repository of metadata. From an stewardship perspective, this activities are the deployment of a data integration architecture, followed by the population of such shared metadata. From a data consumption perspective, the activities are virtual and materialized data integration, the former an exploratory task and the latter a consolidation one. Following the proposed framework, we focus on providing contributions to each of the four activities. We begin proposing a software reference architecture for semantic-aware data-intensive systems. Such architecture serves as a blueprint to deploy a stack of systems, its core being the metadata repository. Next, we propose a graph-based metadata model as formalism for metadata management. We focus on supporting schema and data source evolution, a predominant factor on the heterogeneous sources at hand. For virtual integration, we propose query rewriting algorithms that rely on the previously proposed metadata model. We additionally consider semantic heterogeneities in the data sources, which the proposed algorithms are capable of automatically resolving. Finally, the thesis focuses on the materialized integration activity, and to this end, proposes a method to select intermediate results to materialize in data-intensive flows. Overall, the results of this thesis serve as contribution to the field of data integration in contemporary data-intensive ecosystems.Les dades tenen un impacte indubtable en la societat. La capacitat d’emmagatzemar i processar grans quantitats de dades disponibles Ă©s avui en dia un dels factors claus per l’ùxit d’una organitzaciĂł. No obstant, avui en dia estem presenciant un canvi representat per grans volums de dades heterogenis. En efecte, el 90% de les dades mundials han sigut generades en els Ășltims dos anys. Per tal de dur a terme aquestes tasques d’explotaciĂł de dades, les organitzacions primer han de realitzar una integraciĂł de les dades, combinantles a partir de diferents fonts amb l’objectiu de tenir-ne una vista unificada d’elles. Per aixĂČ, aquest fet requereix reconsiderar les assumpcions tradicionals en integraciĂł amb l’objectiu de lidiar amb els requisits imposats per aquests sistemes de tractament massiu de dades. Aquesta tesi doctoral tĂ© com a objectiu proporcional un nou marc de treball per a la integraciĂł de dades en el context de sistemes de tractament massiu de dades, el qual implica lidiar amb una gran quantitat de dades heterogĂšnies, provinents de mĂșltiples fonts i en el seu format original. Per aixĂČ, proposem un procĂ©s d’integraciĂł compost d’una seqĂŒĂšncia d’activitats governades per una capa semĂ ntica, la qual Ă©s implementada a partir d’un repositori de metadades compartides. Des d’una perspectiva d’administraciĂł, aquestes activitats sĂłn el desplegament d’una arquitectura d’integraciĂł de dades, seguit per la inserciĂł d’aquestes metadades compartides. Des d’una perspectiva de consum de dades, les activitats sĂłn la integraciĂł virtual i materialitzaciĂł de les dades, la primera sent una tasca exploratĂČria i la segona una de consolidaciĂł. Seguint el marc de treball proposat, ens centrem en proporcionar contribucions a cada una de les quatre activitats. La tesi inicia proposant una arquitectura de referĂšncia de software per a sistemes de tractament massiu de dades amb coneixement semĂ ntic. Aquesta arquitectura serveix com a planell per a desplegar un conjunt de sistemes, sent el repositori de metadades al seu nucli. Posteriorment, proposem un model basat en grafs per a la gestiĂł de metadades. Concretament, ens centrem en donar suport a l’evoluciĂł d’esquemes i fonts de dades, un dels factors predominants en les fonts de dades heterogĂšnies considerades. Per a l’integraciĂł virtual, proposem algorismes de rescriptura de consultes que usen el model de metadades previament proposat. Com a afegitĂł, considerem heterogeneĂŻtat semĂ ntica en les fonts de dades, les quals els algorismes de rescriptura poden resoldre automĂ ticament. Finalment, la tesi es centra en l’activitat d’integraciĂł materialitzada. Per aixĂČ proposa un mĂštode per a seleccionar els resultats intermedis a materialitzar un fluxes de tractament intensiu de dades. En general, els resultats d’aquesta tesi serveixen com a contribuciĂł al camp d’integraciĂł de dades en els ecosistemes de tractament massiu de dades contemporanisLes donnĂ©es ont un impact indĂ©niable sur la sociĂ©tĂ©. Le stockage et le traitement de grandes quantitĂ©s de donnĂ©es disponibles constituent actuellement l’un des facteurs clĂ©s de succĂšs d’une entreprise. NĂ©anmoins, nous assistons rĂ©cemment Ă  un changement reprĂ©sentĂ© par des quantitĂ©s de donnĂ©es massives et hĂ©tĂ©rogĂšnes. En effet, 90% des donnĂ©es dans le monde ont Ă©tĂ© gĂ©nĂ©rĂ©es au cours des deux derniĂšres annĂ©es. Ainsi, pour mener Ă  bien ces tĂąches d’exploitation des donnĂ©es, les organisations doivent d’abord rĂ©aliser une intĂ©gration des donnĂ©es en combinant des donnĂ©es provenant de sources multiples pour obtenir une vue unifiĂ©e de ces derniĂšres. Cependant, l’intĂ©gration de quantitĂ©s de donnĂ©es massives et hĂ©tĂ©rogĂšnes nĂ©cessite de revoir les hypothĂšses d’intĂ©gration traditionnelles afin de faire face aux nouvelles exigences posĂ©es par les systĂšmes de gestion de donnĂ©es massives. Cette thĂšse de doctorat a pour objectif de fournir un nouveau cadre pour l’intĂ©gration de donnĂ©es dans le contexte d’écosystĂšmes Ă  forte intensitĂ© de donnĂ©es, ce qui implique de traiter de grandes quantitĂ©s de donnĂ©es hĂ©tĂ©rogĂšnes, provenant de sources multiples et dans leur format d’origine. À cette fin, nous prĂ©conisons un processus d’intĂ©gration constituĂ© d’activitĂ©s sĂ©quentielles rĂ©gies par une couche sĂ©mantique, mise en oeuvre via un dĂ©pĂŽt partagĂ© de mĂ©tadonnĂ©es. Du point de vue de la gestion, ces activitĂ©s consistent Ă  dĂ©ployer une architecture d’intĂ©gration de donnĂ©es, suivies de la population de mĂ©tadonnĂ©es partagĂ©es. Du point de vue de la consommation de donnĂ©es, les activitĂ©s sont l’intĂ©gration de donnĂ©es virtuelle et matĂ©rialisĂ©e, la premiĂšre Ă©tant une tĂąche exploratoire et la seconde, une tĂąche de consolidation. ConformĂ©ment au cadre proposĂ©, nous nous attachons Ă  fournir des contributions Ă  chacune des quatre activitĂ©s. Nous commençons par proposer une architecture logicielle de rĂ©fĂ©rence pour les systĂšmes de gestion de donnĂ©es massives et Ă  connaissance sĂ©mantique. Une telle architecture consiste en un schĂ©ma directeur pour le dĂ©ploiement d’une pile de systĂšmes, le dĂ©pĂŽt de mĂ©tadonnĂ©es Ă©tant son composant principal. Ensuite, nous proposons un modĂšle de mĂ©tadonnĂ©es basĂ© sur des graphes comme formalisme pour la gestion des mĂ©tadonnĂ©es. Nous mettons l’accent sur la prise en charge de l’évolution des schĂ©mas et des sources de donnĂ©es, facteur prĂ©dominant des sources hĂ©tĂ©rogĂšnes sous-jacentes. Pour l’intĂ©gration virtuelle, nous proposons des algorithmes de rĂ©Ă©criture de requĂȘtes qui s’appuient sur le modĂšle de mĂ©tadonnĂ©es proposĂ© prĂ©cĂ©demment. Nous considĂ©rons en outre les hĂ©tĂ©rogĂ©nĂ©itĂ©s sĂ©mantiques dans les sources de donnĂ©es, que les algorithmes proposĂ©s sont capables de rĂ©soudre automatiquement. Enfin, la thĂšse se concentre sur l’activitĂ© d’intĂ©gration matĂ©rialisĂ©e et propose Ă  cette fin une mĂ©thode de sĂ©lection de rĂ©sultats intermĂ©diaires Ă  matĂ©rialiser dans des flux des donnĂ©es massives. Dans l’ensemble, les rĂ©sultats de cette thĂšse constituent une contribution au domaine de l’intĂ©gration des donnĂ©es dans les Ă©cosystĂšmes contemporains de gestion de donnĂ©es massivesPostprint (published version

    TĂ€pne ja tĂ”hus protsessimudelite automaatne koostamine sĂŒndmuslogidest

    Get PDF
    Töötajate igapĂ€evatöö koosneb tegevustest, mille eesmĂ€rgiks on teenuste pakkumine vĂ”i toodete valmistamine. Selliste tegevuste terviklikku jada nimetatakse protsessiks. Protsessi kvaliteet ja efektiivsus mĂ”jutab otseselt kliendi kogemust – tema arvamust ja hinnangut teenusele vĂ”i tootele. Kliendi kogemus on eduka ettevĂ”tte arendamise oluline tegur, mis paneb ettevĂ”tteid jĂ€rjest rohkem pöörama tĂ€helepanu oma protsesside kirjeldamisele, analĂŒĂŒsimisele ja parendamisele. Protsesside kirjeldamisel kasutatakse tavaliselt visuaalseid vahendeid, sellisel kujul koostatud kirjeldust nimetatakse protsessimudeliks. Kuna mudeli koostaja ei suuda panna kirja kĂ”ike erandeid, mis vĂ”ivad reaalses protsessis esineda, siis ei ole need mudelid paljudel juhtudel terviklikud. Samuti on probleemiks suur töömaht - inimese ajakulu protsessimudeli koostamisel on suur. Protsessimudelite automaatne koostamine (protsessituvastus) vĂ”imaldab genereerida protsessimudeli toetudes tegevustega seotud andmetele. Protsessituvastus aitab meil vĂ€hendada protsessimudeli loomisele kuluvat aega ja samuti on tulemusena tekkiv mudel (vĂ”rreldes kĂ€sitsi tehtud mudeliga) kvaliteetsem. Protsessituvastuse tulemusel loodud mudeli kvaliteet sĂ”ltub nii algandmete kvaliteedist kui ka protsessituvastuse algoritmist. Antud doktoritöös anname ĂŒlevaate erinevatest protsessituvastuse algoritmidest. Toome vĂ€lja puudused ja pakume vĂ€lja uue algoritmi Split Miner. VĂ”rreldes olemasolevate algoritmidega on Splint Miner kiirem ja annab tulemuseks kvaliteetsema protsessimudeli. Samuti pakume vĂ€lja uue lĂ€henemise automaatselt koostatud protsessimudeli korrektsuse hindamiseks, mis on vĂ”rreldes olemasolevate meetoditega usaldusvÀÀrsem. Doktoritöö nĂ€itab, kuidas kasutada optimiseerimise algoritme protsessimudeli korrektsuse suurendamiseks.Everyday, companies’ employees perform activities with the goal of providing services (or products) to their customers. A sequence of such activities is known as business process. The quality and the efficiency of a business process directly influence the customer experience. In a competitive business environment, achieving a great customer experience is fundamental to be a successful company. For this reason, companies are interested in identifying their business processes to analyse and improve them. To analyse and improve a business process, it is generally useful to first write it down in the form of a graphical representation, namely a business process model. Drawing such process models manually is time-consuming because of the time it takes to collect detailed information about the execution of the process. Also, manually drawn process models are often incomplete because it is difficult to uncover every possible execution path in the process via manual data collection. Automated process discovery allows business analysts to exploit process' execution data to automatically discover process models. Discovering high-quality process models is extremely important to reduce the time spent enhancing them and to avoid mistakes during process analysis. The quality of an automatically discovered process model depends on both the input data and the automated process discovery application that is used. In this thesis, we provide an overview of the available algorithms to perform automated process discovery. We identify deficiencies in existing algorithms, and we propose a new algorithm, called Split Miner, which is faster and consistently discovers more accurate process models than existing algorithms. We also propose a new approach to measure the accuracy of automatically discovered process models in a fine-grained manner, and we use this new measurement approach to optimize the accuracy of automatically discovered process models.https://www.ester.ee/record=b530061

    Äriprotsesside ajaliste nĂ€itajate selgitatav ennustav jĂ€lgimine

    Get PDF
    Kaasaegsed ettevĂ”tte infosĂŒsteemid vĂ”imaldavad ettevĂ”tetel koguda detailset informatsiooni Ă€riprotsesside tĂ€itmiste kohta. Eelnev koos masinĂ”ppe meetoditega vĂ”imaldab kasutada andmejuhitavaid ja ennustatavaid lĂ€henemisi Ă€riprotsesside jĂ”udluse jĂ€lgimiseks. Kasutades ennustuslike Ă€riprotsesside jĂ€lgimise tehnikaid on vĂ”imalik jĂ”udluse probleeme ennustada ning soovimatu tegurite mĂ”ju ennetavalt leevendada. TĂŒĂŒpilised kĂŒsimused, millega tegeleb ennustuslik protsesside jĂ€lgimine on “millal antud Ă€riprotsess lĂ”ppeb?” vĂ”i “mis on kĂ”ige tĂ”enĂ€olisem jĂ€rgmine sĂŒndmus antud Ă€riprotsessi jaoks?”. Suurim osa olemasolevatest lahendustest eelistavad tĂ€psust selgitatavusele. Praktikas, selgitatavus on ennustatavate tehnikate tĂ€htis tunnus. Ennustused, kas protsessi tĂ€itmine ebaĂ”nnestub vĂ”i selle tĂ€itmisel vĂ”ivad tekkida raskused, pole piisavad. On oluline kasutajatele seletada, kuidas on selline ennustuse tulemus saavutatud ning mida saab teha soovimatu tulemuse ennetamiseks. Töö pakub vĂ€lja kaks meetodit ennustatavate mudelite konstrueerimiseks, mis vĂ”imaldavad jĂ€lgida Ă€riprotsesse ning keskenduvad selgitatavusel. Seda saavutatakse ennustuse lahtivĂ”tmisega elementaarosadeks. NĂ€iteks, kui ennustatakse, et Ă€riprotsessi lĂ”puni on jÀÀnud aega 20 tundi, siis saame anda seletust, et see aeg on moodustatud kĂ”ikide seni kĂ€sitlemata tegevuste lĂ”petamiseks vajalikust ajast. Töös vĂ”rreldakse omavahel eelmainitud meetodeid, kĂ€sitledes Ă€riprotsesse erinevatest valdkondadest. Hindamine toob esile erinevusi selgitatava ja tĂ€psusele pĂ”hinevale lĂ€henemiste vahel. Töö teaduslik panus on ennustuslikuks protsesside jĂ€lgimiseks vabavaralise tööriista arendamine. SĂŒsteemi nimeks on Nirdizati ning see sĂŒsteem vĂ”imaldab treenida ennustuslike masinĂ”ppe mudeleid, kasutades nii töös kirjeldatud meetodeid kui ka kolmanda osapoole meetodeid. Hiljem saab treenitud mudeleid kasutada hetkel kĂ€ivate Ă€riprotsesside tulemuste ennustamiseks, mis saab aidata kasutajaid reaalajas.Modern enterprise systems collect detailed data about the execution of the business processes they support. The widespread availability of such data in companies, coupled with advances in machine learning, have led to the emergence of data-driven and predictive approaches to monitor the performance of business processes. By using such predictive process monitoring approaches, potential performance issues can be anticipated and proactively mitigated. Various approaches have been proposed to address typical predictive process monitoring questions, such as what is the most likely continuation of an ongoing process instance, or when it will finish. However, most existing approaches prioritize accuracy over explainability. Yet in practice, explainability is a critical property of predictive methods. It is not enough to accurately predict that a running process instance will end up in an undesired outcome. It is also important for users to understand why this prediction is made and what can be done to prevent this undesired outcome. This thesis proposes two methods to build predictive models to monitor business processes in an explainable manner. This is achieved by decomposing a prediction into its elementary components. For example, to explain that the remaining execution time of a process execution is predicted to be 20 hours, we decompose this prediction into the predicted execution time of each activity that has not yet been executed. We evaluate the proposed methods against each other and various state-of-the-art baselines using a range of business processes from multiple domains. The evaluation reaffirms a fundamental trade-off between explainability and accuracy of predictions. The research contributions of the thesis have been consolidated into an open-source tool for predictive business process monitoring, namely Nirdizati. It can be used to train predictive models using the methods described in this thesis, as well as third-party methods. These models are then used to make predictions for ongoing process instances; thus, the tool can also support users at runtime
    • 

    corecore