3,676 research outputs found

    A Comprehensive Scalable Framework for Cloud-Native Pattern Detection with Enhanced Expressiveness

    Full text link
    Detecting complex patterns in large volumes of event logs has diverse applications in various domains, such as business processes and fraud detection. Existing systems like ELK are commonly used to tackle this challenge, but their performance deteriorates for large patterns, while they suffer from limitations in terms of expressiveness and explanatory capabilities for their responses. In this work, we propose a solution that integrates a Complex Event Processing (CEP) engine into a broader query processsor on top of a decoupled storage infrastructure containing inverted indices of log events. The results demonstrate that our system excels in scalability and robustness, particularly in handling complex queries. Notably, our proposed system delivers responses for large complex patterns within seconds, while ELK experiences timeouts after 10 minutes. It also significantly outperforms solutions relying on FlinkCEP and executing MATCH_RECOGNIZE SQL queries

    Effective Removal of Operational Log Messages: an Application to Model Inference

    Full text link
    Model inference aims to extract accurate models from the execution logs of software systems. However, in reality, logs may contain some "noise" that could deteriorate the performance of model inference. One form of noise can commonly be found in system logs that contain not only transactional messages---logging the functional behavior of the system---but also operational messages---recording the operational state of the system (e.g., a periodic heartbeat to keep track of the memory usage). In low-quality logs, transactional and operational messages are randomly interleaved, leading to the erroneous inclusion of operational behaviors into a system model, that ideally should only reflect the functional behavior of the system. It is therefore important to remove operational messages in the logs before inferring models. In this paper, we propose LogCleaner, a novel technique for removing operational logs messages. LogCleaner first performs a periodicity analysis to filter out periodic messages, and then it performs a dependency analysis to calculate the degree of dependency for all log messages and to remove operational messages based on their dependencies. The experimental results on two proprietary and 11 publicly available log datasets show that LogCleaner, on average, can accurately remove 98% of the operational messages and preserve 81% of the transactional messages. Furthermore, using logs pre-processed with LogCleaner decreases the execution time of model inference (with a speed-up ranging from 1.5 to 946.7 depending on the characteristics of the system) and significantly improves the accuracy of the inferred models, by increasing their ability to accept correct system behaviors (+43.8 pp on average, with pp=percentage points) and to reject incorrect system behaviors (+15.0 pp on average)

    Process Mining Handbook

    Get PDF
    This is an open access book. This book comprises all the single courses given as part of the First Summer School on Process Mining, PMSS 2022, which was held in Aachen, Germany, during July 4-8, 2022. This volume contains 17 chapters organized into the following topical sections: Introduction; process discovery; conformance checking; data preprocessing; process enhancement and monitoring; assorted process mining topics; industrial perspective and applications; and closing

    The sequential analysis of transgressors’ accounts of breaking environmental laws

    Get PDF
    Three-hundred and twenty written accounts of environmental transgressors were assessed by sequential analysis to reveal their argument streams. The accounts were obtained from the written statements that transgressors are allowed to give during the Spanish administrative process and which were included in files handled by four environmental law enforcement agencies. These agencies are distributed across national, regional, island and municipality jurisdictions. The setting for the study is a highly protected environment in which environmental laws have high salience. Results reveal that transgressors use simple argument streams, consistently more defensive than conciliatory, and questioning the perceived legitimacy of environmental law. It was seen also that the empirical functioning of the explanations related to pursuing emotional/prosocial objectives differs from what was expected from the traditional conceptual definition. Results are discussed in terms of how the assessment of the internal dynamic of the accounts would provide valuable information on transgressors’ reasoning in relation to environmental laws.Se examinaron 320 explicaciones exculpatorias dadas por transgresores medioambientales para evaluar, mediante análisis secuencial, sus líneas argumentales. Las explicaciones se obtuvieron a partir de alegaciones que los transgresores pueden presentar por escrito a lo largo del proceso español de sanción administrativa y que estaban incluidas en expedientes tramitados por cuatro administraciones encargadas de aplicar las leyes medioambientales a nivel nacional, autonómico, insular y municipal. El contexto del estudio es un entorno protegido en el que las leyes medioambientales tienen una alta relevancia. Los resultados muestran que los transgresores usan secuencias argumentales simples, consistentemente más defensivas que conciliadoras, y que cuestionan la legitimidad de la ley medioambiental. Se observó también que, empíricamente, las explicaciones relacionadas con la consecución de objetivos emocionales/prosociales funcionan de manera diferente a la esperada según la definición conceptual clásica. Los resultados se discuten enfatizando cómo el análisis de la dinámica interna de las explicaciones proporciona información valiosa acerca del razonamiento de los transgresores respecto a las leyes medioambientales

    Tailoring Machine Learning for Process Mining

    Full text link
    Machine learning models are routinely integrated into process mining pipelines to carry out tasks like data transformation, noise reduction, anomaly detection, classification, and prediction. Often, the design of such models is based on some ad-hoc assumptions about the corresponding data distributions, which are not necessarily in accordance with the non-parametric distributions typically observed with process data. Moreover, the learning procedure they follow ignores the constraints concurrency imposes to process data. Data encoding is a key element to smooth the mismatch between these assumptions but its potential is poorly exploited. In this paper, we argue that a deeper insight into the issues raised by training machine learning models with process data is crucial to ground a sound integration of process mining and machine learning. Our analysis of such issues is aimed at laying the foundation for a methodology aimed at correctly aligning machine learning with process mining requirements and stimulating the research to elaborate in this direction.Comment: 16 page
    • …
    corecore