3,676 research outputs found
A Comprehensive Scalable Framework for Cloud-Native Pattern Detection with Enhanced Expressiveness
Detecting complex patterns in large volumes of event logs has diverse
applications in various domains, such as business processes and fraud
detection. Existing systems like ELK are commonly used to tackle this
challenge, but their performance deteriorates for large patterns, while they
suffer from limitations in terms of expressiveness and explanatory capabilities
for their responses. In this work, we propose a solution that integrates a
Complex Event Processing (CEP) engine into a broader query processsor on top of
a decoupled storage infrastructure containing inverted indices of log events.
The results demonstrate that our system excels in scalability and robustness,
particularly in handling complex queries. Notably, our proposed system delivers
responses for large complex patterns within seconds, while ELK experiences
timeouts after 10 minutes. It also significantly outperforms solutions relying
on FlinkCEP and executing MATCH_RECOGNIZE SQL queries
Effective Removal of Operational Log Messages: an Application to Model Inference
Model inference aims to extract accurate models from the execution logs of
software systems. However, in reality, logs may contain some "noise" that could
deteriorate the performance of model inference. One form of noise can commonly
be found in system logs that contain not only transactional messages---logging
the functional behavior of the system---but also operational
messages---recording the operational state of the system (e.g., a periodic
heartbeat to keep track of the memory usage). In low-quality logs,
transactional and operational messages are randomly interleaved, leading to the
erroneous inclusion of operational behaviors into a system model, that ideally
should only reflect the functional behavior of the system. It is therefore
important to remove operational messages in the logs before inferring models.
In this paper, we propose LogCleaner, a novel technique for removing
operational logs messages. LogCleaner first performs a periodicity analysis to
filter out periodic messages, and then it performs a dependency analysis to
calculate the degree of dependency for all log messages and to remove
operational messages based on their dependencies. The experimental results on
two proprietary and 11 publicly available log datasets show that LogCleaner, on
average, can accurately remove 98% of the operational messages and preserve 81%
of the transactional messages. Furthermore, using logs pre-processed with
LogCleaner decreases the execution time of model inference (with a speed-up
ranging from 1.5 to 946.7 depending on the characteristics of the system) and
significantly improves the accuracy of the inferred models, by increasing their
ability to accept correct system behaviors (+43.8 pp on average, with
pp=percentage points) and to reject incorrect system behaviors (+15.0 pp on
average)
Process Mining Handbook
This is an open access book. This book comprises all the single courses given as part of the First Summer School on Process Mining, PMSS 2022, which was held in Aachen, Germany, during July 4-8, 2022. This volume contains 17 chapters organized into the following topical sections: Introduction; process discovery; conformance checking; data preprocessing; process enhancement and monitoring; assorted process mining topics; industrial perspective and applications; and closing
The sequential analysis of transgressors’ accounts of breaking environmental laws
Three-hundred and twenty written accounts of environmental transgressors were assessed by sequential analysis to reveal their argument streams. The accounts were obtained from the written statements that transgressors are allowed to give during the Spanish administrative process and which were included in files handled by four environmental law enforcement agencies. These agencies are distributed across national, regional, island and municipality jurisdictions. The setting for the study is a highly protected environment in which environmental laws have high salience. Results reveal that transgressors use simple argument streams, consistently more defensive than conciliatory, and questioning the perceived legitimacy of environmental law. It was seen also that the empirical functioning of the explanations related to pursuing emotional/prosocial objectives differs from what was expected from the traditional conceptual definition. Results are discussed in terms of how the assessment of the internal dynamic of the accounts would provide valuable information on transgressors’ reasoning in relation to environmental laws.Se examinaron 320 explicaciones exculpatorias dadas por transgresores medioambientales para evaluar, mediante análisis secuencial, sus lÃneas argumentales. Las explicaciones se obtuvieron a partir de alegaciones que los transgresores pueden presentar por escrito a lo largo del proceso español de sanción administrativa y que estaban incluidas en expedientes tramitados por cuatro administraciones encargadas de aplicar las leyes medioambientales a nivel nacional, autonómico, insular y municipal. El contexto del estudio es un entorno protegido en el que las leyes medioambientales tienen una alta relevancia. Los resultados muestran que los transgresores usan secuencias argumentales simples, consistentemente más defensivas que conciliadoras, y que cuestionan la legitimidad de la ley medioambiental. Se observó también que, empÃricamente, las explicaciones relacionadas con la consecución de objetivos emocionales/prosociales funcionan de manera diferente a la esperada según la definición conceptual clásica. Los resultados se discuten enfatizando cómo el análisis de la dinámica interna de las explicaciones proporciona información valiosa acerca del razonamiento de los transgresores respecto a las leyes medioambientales
Tailoring Machine Learning for Process Mining
Machine learning models are routinely integrated into process mining
pipelines to carry out tasks like data transformation, noise reduction, anomaly
detection, classification, and prediction. Often, the design of such models is
based on some ad-hoc assumptions about the corresponding data distributions,
which are not necessarily in accordance with the non-parametric distributions
typically observed with process data. Moreover, the learning procedure they
follow ignores the constraints concurrency imposes to process data. Data
encoding is a key element to smooth the mismatch between these assumptions but
its potential is poorly exploited. In this paper, we argue that a deeper
insight into the issues raised by training machine learning models with process
data is crucial to ground a sound integration of process mining and machine
learning. Our analysis of such issues is aimed at laying the foundation for a
methodology aimed at correctly aligning machine learning with process mining
requirements and stimulating the research to elaborate in this direction.Comment: 16 page
- …