591 research outputs found

    Decomposing conformance checking on Petri nets with data

    Get PDF
    Process mining techniques relate observed behavior to modeled behavior, e.g., the automatic discovery of a Petri net based on an event log. Process mining is not limited to process discovery and also includes conformance checking. Conformance checking techniques are used for evaluating the quality of discovered process models and to diagnose deviations from some normative model (e.g., to check compliance). Existing conformance checking approaches typically focus on the control flow, thus being unable to diagnose deviations concerning data. This paper proposes a technique to check the conformance of data-aware process models. We use so-called "data Petri nets" to model data variables, guards, and read/write actions. Additional perspectives such as resource allocation and time constraints can be encoded in terms of variables. Data-aware conformance checking problem may be very time consuming and sometimes even intractable when there are many transitions and data variables. Therefore, we propose a technique to decompose large data-aware conformance checking problems into smaller problems that can be solved more efficiently. We provide a general correctness result showing that decomposition does not influence the outcome of conformance checking. Moreover, two decomposition strategies are presented. The approach is supported through ProM plug-ins and experimental results show that significant performance improvements are indeed possible

    Data mining based cyber-attack detection

    Get PDF

    CASE ID DETECTION IN UNLABEL LED EVENT LOGS FOR PROCESS MINING

    Get PDF
    In the realm of data science, event logs serve as valuable sources of information, capturing sequences of events or activities in various processes. However, when dealing with unlabelled event logs, the absence of a designated Case ID column poses a critical challenge, hindering the understanding of relationships and dependencies among events within a case or process. Motivated by the increasing adoption of data-driven decision-making and the need for efficient data analysis techniques, this master’s project presents the "Case ID Column Identification Library" project. This library aims to streamline data preprocessing and enhance the efficiency of subsequent data analysis tasks by automatically identifying the Case ID column in unlabelled event logs. The project’s objective is to develop a versatile and user-friendly library that incorporates multiple methods, including a Convolutional Neural Network (CNN) and a parameterizable heuristic approach, to accurately identify the Case ID column. By offering flexibility to users, they can choose individual methods or a combination of methods based on their specific requirements, along with adjusting heuristic-based formula coefficients and settings for fine-tuning the identification process. This report presents a comprehensive exploration of related work, methodology, data understanding, methods for Case ID column identification, software library development, and experimental results. The results demonstrate the effectiveness of the proposed methods and their implications for decision support systems

    Towards an evaluation framework for process mining algorithms

    Get PDF
    Although there has been a lot of progress in developing process mining algorithms in recent years, no effort has been put in developing a common means of assessing the quality of the models discovered by these algorithms. In this paper, we outline elements of an evaluation framework that is intended to enable (a) process mining researchers to compare the performance of their algorithms, and (b) end users to evaluate the validity of their process mining results. Furthermore, we describe two possible approaches to evaluate a discovered model (i) using existing comparison metrics that have been developed by the process mining research community, and (ii) based on the so-called k-fold-cross validation known from the machine learning community. To illustrate the application of these two approaches, we compared a set of models discovered by different algorithms based on a simple example log

    Security Analytics: Using Deep Learning to Detect Cyber Attacks

    Get PDF
    Security attacks are becoming more prevalent as cyber attackers exploit system vulnerabilities for financial gain. The resulting loss of revenue and reputation can have deleterious effects on governments and businesses alike. Signature recognition and anomaly detection are the most common security detection techniques in use today. These techniques provide a strong defense. However, they fall short of detecting complicated or sophisticated attacks. Recent literature suggests using security analytics to differentiate between normal and malicious user activities. The goal of this research is to develop a repeatable process to detect cyber attacks that is fast, accurate, comprehensive, and scalable. A model was developed and evaluated using several production log files provided by the University of North Florida Information Technology Security department. This model uses security analytics to complement existing security controls to detect suspicious user activity occurring in real time by applying machine learning algorithms to multiple heterogeneous server-side log files. The process is linearly scalable and comprehensive; as such it can be applied to any enterprise environment. The process is composed of three steps. The first step is data collection and transformation which involves identifying the source log files and selecting a feature set from those files. The resulting feature set is then transformed into a time series dataset using a sliding time window representation. Each instance of the dataset is labeled as green, yellow, or red using three different unsupervised learning methods, one of which is Partitioning around Medoids (PAM). The final step uses Deep Learning to train and evaluate the model that will be used for detecting abnormal or suspicious activities. Experiments using datasets of varying sizes of time granularity resulted in a very high accuracy and performance. The time required to train and test the model was surprisingly fast even for large datasets. This is the first research paper that develops a model to detect cyber attacks using security analytics; hence this research builds a foundation on which to expand upon for future research in this subject area
    • …
    corecore