Search CORE

1,876 research outputs found

Advancements and Challenges in Object-Centric Process Mining: A Systematic Literature Review

Author: Berti Alessandro
Montali Marco
van der Aalst Wil M. P.
Publication venue
Publication date: 15/11/2023
Field of study

Recent years have seen the emergence of object-centric process mining techniques. Born as a response to the limitations of traditional process mining in analyzing event data from prevalent information systems like CRM and ERP, these techniques aim to tackle the deficiency, convergence, and divergence issues seen in traditional event logs. Despite the promise, the adoption in real-world process mining analyses remains limited. This paper embarks on a comprehensive literature review of object-centric process mining, providing insights into the current status of the discipline and its historical trajectory

arXiv.org e-Print Archive

A schema framework for graph event data

Author: Esser S.
Publication venue
Publication date: 19/02/2020
Field of study

Pure OAI Repository

Survey of time series database technology

Author: McBride Brian
Reynolds Dave
Publication venue: UK Centre for Ecology & Hydrology
Publication date: 30/03/2020
Field of study

This report has been prepared by Epimorphics Ltd. as part of the ENTRAIN project (NERC grant number NE/S016244/1) which is a feasibility project within the “NERC Constructing a Digital Environment Strategic Priorities Fund Programme”. The Centre for Ecology and Hydrology(CEH) is a research organisation focusing on land and freshwater ecosystems and their interaction with the atmosphere. The organization manages a number of sensor networks to monitor the environment, and also handles large databases of 3rd party data (e.g. river flows measured by the Environment Agency and equivalents in Scotland and Wales). Data from these networks is stored and made available to users, both internally (through direct query of databases, and externally via web-services). The ENTRAIN project aims to address a number of issues in relation to sensor data storage and integration, using a number of hydrological datasets to help define use cases: COSMOS-UK (a network of ~50 sites measuring soil moisture and meteorological variables at 1-30 minute resolutions); the CEH Greenhouse Gas (GHG) network (~15 sites measuring sub-second fluxes of gases and moisture, subsequently processed up to 30-minute aggregations); the Thames Initiative (a database of weekly and hourly water quality samples from sites around the Thames basin). In addition this report considers the UK National River Flow Archive, a database of daily river flows and catchment rainfall derived by regional environmental agencies from 15-minute measurements of river levels and flows. CEH commissioned this report to survey alternative technologies for storing sensor data that scale better, could manage larger data volumes more easily and less expensively, and that might be readily deployed on different infrastructures

NERC Open Research Archive

Implementation of a Process Mining Tool on top of an Event Graph Database

Author: Hernandez Siles Valdemar
Publication venue
Publication date: 04/09/2021
Field of study

Pure OAI Repository

Bench-Ranking: ettekirjutav analüüsimeetod suurte teadmiste graafide päringutele

Author: Ragab Mohamed
Publication venue
Publication date: 21/12/2022
Field of study

Relatsiooniliste suurandmete (BD) töötlemisraamistike kasutamine suurte teadmiste graafide töötlemiseks kätkeb endas võimalust päringu jõudlust optimeerimida. Kaasaegsed BD-süsteemid on samas keerulised andmesüsteemid, mille konfiguratsioonid omavad olulist mõju jõudlusele. Erinevate raamistike ja konfiguratsioonide võrdlusuuringud pakuvad kogukonnale parimaid tavasid parema jõudluse saavutamiseks. Enamik neist võrdlusuuringutest saab liigitada siiski vaid kirjeldavaks ja diagnostiliseks analüütikaks. Lisaks puudub ühtne standard nende uuringute võrdlemiseks kvantitatiivselt järjestatud kujul. Veelgi enam, suurte graafide töötlemiseks vajalike konveierite kavandamine eeldab täiendavaid disainiotsuseid mis tulenevad mitteloomulikust (relatsioonilisest) graafi töötlemise paradigmast. Taolisi disainiotsuseid ei saa automaatselt langetada, nt relatsiooniskeemi, partitsioonitehnika ja salvestusvormingute valikut. Käesolevas töös käsitleme kuidas me antud uurimuslünga täidame. Esmalt näitame disainiotsuste kompromisside mõju BD-süsteemide jõudluse korratavusele suurte teadmiste graafide päringute tegemisel. Lisaks näitame BD-raamistike jõudluse kirjeldavate ja diagnostiliste analüüside piiranguid suurte graafide päringute tegemisel. Seejärel uurime, kuidas lubada ettekirjutavat analüütikat järjestamisfunktsioonide ja mitmemõõtmeliste optimeerimistehnikate (nn "Bench-Ranking") kaudu. See lähenemine peidab kirjeldava tulemusanalüüsi keerukuse, suunates praktiku otse teostatavate teadlike otsusteni.Leveraging relational Big Data (BD) processing frameworks to process large knowledge graphs yields a great interest in optimizing query performance. Modern BD systems are yet complicated data systems, where the configurations notably affect the performance. Benchmarking different frameworks and configurations provides the community with best practices for better performance. However, most of these benchmarking efforts are classified as descriptive and diagnostic analytics. Moreover, there is no standard for comparing these benchmarks based on quantitative ranking techniques. Moreover, designing mature pipelines for processing big graphs entails considering additional design decisions that emerge with the non-native (relational) graph processing paradigm. Those design decisions cannot be decided automatically, e.g., the choice of the relational schema, partitioning technique, and storage formats. Thus, in this thesis, we discuss how our work fills this timely research gap. Particularly, we first show the impact of those design decisions’ trade-offs on the BD systems’ performance replicability when querying large knowledge graphs. Moreover, we showed the limitations of the descriptive and diagnostic analyses of BD frameworks’ performance for querying large graphs. Thus, we investigate how to enable prescriptive analytics via ranking functions and Multi-Dimensional optimization techniques (called ”Bench-Ranking”). This approach abstracts out from the complexity of descriptive performance analysis, guiding the practitioner directly to actionable informed decisions.https://www.ester.ee/record=b553332

DSpace at Tartu University Library

OC-PM: Analyzing Object-Centric Event Logs and Process Models

Author: Berti Alessandro
van der Aalst Wil
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/09/2022
Field of study

Object-centric process mining is a novel branch of process mining that aims to analyze event data from mainstream information systems (such as SAP) more naturally, without being forced to form mutually exclusive groups of events with the specification of a case notion. The development of object-centric process mining is related to exploiting object-centric event logs, which includes exploring and filtering the behavior contained in the logs and constructing process models which can encode the behavior of different classes of objects and their interactions (which can be discovered from object-centric event logs). This paper aims to provide a broad look at the exploration and processing of object-centric event logs to discover information related to the lifecycle of the different objects composing the event log. Also, comprehensive tool support (OC-PM) implementing the proposed techniques is described in the paper

arXiv.org e-Print Archive

Digital content popularity counting with Amazon Web Services

Author: Sarapalo Joonas
Publication venue: Helsingfors universitet
Publication date: 01/01/2020
Field of study

The page hit counter system processes, counts and stores page hit counts gathered from page hit events from a news media company’s websites and mobile applications. The system serves a public application interface which can be queried over the internet for page hit count information. In this thesis I will describe the process of replacing a legacy page hit counter system with a modern implementation in the Amazon Web Services ecosystem utilizing serverless technologies. The process includes the background information, the project requirements, the design and comparison of different options, the implementation details and the results. Finally, I will show how the new system implemented with Amazon Kinesis, AWS Lambda and Amazon DynamoDB has running costs that are less than half of that of the old one’s

Helsingin yliopiston digitaalinen arkisto