591 research outputs found
Decomposing conformance checking on Petri nets with data
Process mining techniques relate observed behavior to modeled behavior, e.g., the automatic discovery of a Petri net based on an event log. Process mining is not limited to process discovery and also includes conformance checking. Conformance checking techniques are used for evaluating the quality of discovered process models and to diagnose deviations from some normative model (e.g., to check compliance). Existing conformance checking approaches typically focus on the control flow, thus being unable to diagnose deviations concerning data. This paper proposes a technique to check the conformance of data-aware process models. We use so-called "data Petri nets" to model data variables, guards, and read/write actions. Additional perspectives such as resource allocation and time constraints can be encoded in terms of variables. Data-aware conformance checking problem may be very time consuming and sometimes even intractable when there are many transitions and data variables. Therefore, we propose a technique to decompose large data-aware conformance checking problems into smaller problems that can be solved more efficiently. We provide a general correctness result showing that decomposition does not influence the outcome of conformance checking. Moreover, two decomposition strategies are presented. The approach is supported through ProM plug-ins and experimental results show that significant performance improvements are indeed possible
CASE ID DETECTION IN UNLABEL LED EVENT LOGS FOR PROCESS MINING
In the realm of data science, event logs serve as valuable sources of information,
capturing sequences of events or activities in various processes. However, when
dealing with unlabelled event logs, the absence of a designated Case ID column poses
a critical challenge, hindering the understanding of relationships and dependencies
among events within a case or process.
Motivated by the increasing adoption of data-driven decision-making and the
need for efficient data analysis techniques, this master’s project presents the "Case
ID Column Identification Library" project. This library aims to streamline data
preprocessing and enhance the efficiency of subsequent data analysis tasks by
automatically identifying the Case ID column in unlabelled event logs.
The project’s objective is to develop a versatile and user-friendly library that
incorporates multiple methods, including a Convolutional Neural Network (CNN)
and a parameterizable heuristic approach, to accurately identify the Case ID column.
By offering flexibility to users, they can choose individual methods or a combination
of methods based on their specific requirements, along with adjusting heuristic-based
formula coefficients and settings for fine-tuning the identification process.
This report presents a comprehensive exploration of related work, methodology,
data understanding, methods for Case ID column identification, software library
development, and experimental results. The results demonstrate the effectiveness of
the proposed methods and their implications for decision support systems
Towards an evaluation framework for process mining algorithms
Although there has been a lot of progress in developing process mining algorithms in recent years, no effort has been put in developing a common means of assessing the quality of the models discovered by these algorithms. In this paper, we outline elements of an evaluation framework that is intended to enable (a) process mining researchers to compare the performance of their algorithms, and (b) end users to evaluate the validity of their process mining results. Furthermore, we describe two possible approaches to evaluate a discovered model (i) using existing comparison metrics that have been developed by the process mining research community, and (ii) based on the so-called k-fold-cross validation known from the machine learning community. To illustrate the application of these two approaches, we compared a set of models discovered by different algorithms based on a simple example log
Security Analytics: Using Deep Learning to Detect Cyber Attacks
Security attacks are becoming more prevalent as cyber attackers exploit system vulnerabilities for financial gain. The resulting loss of revenue and reputation can have deleterious effects on governments and businesses alike. Signature recognition and anomaly detection are the most common security detection techniques in use today. These techniques provide a strong defense. However, they fall short of detecting complicated or sophisticated attacks. Recent literature suggests using security analytics to differentiate between normal and malicious user activities.
The goal of this research is to develop a repeatable process to detect cyber attacks that is fast, accurate, comprehensive, and scalable. A model was developed and evaluated using several production log files provided by the University of North Florida Information Technology Security department. This model uses security analytics to complement existing security controls to detect suspicious user activity occurring in real time by applying machine learning algorithms to multiple heterogeneous server-side log files. The process is linearly scalable and comprehensive; as such it can be applied to any enterprise environment. The process is composed of three steps. The first step is data collection and transformation which involves identifying the source log files and selecting a feature set from those files. The resulting feature set is then transformed into a time series dataset using a sliding time window representation. Each instance of the dataset is labeled as green, yellow, or red using three different unsupervised learning
methods, one of which is Partitioning around Medoids (PAM). The final step uses Deep Learning to train and evaluate the model that will be used for detecting abnormal or suspicious activities. Experiments using datasets of varying sizes of time granularity resulted in a very high accuracy and performance. The time required to train and test the model was surprisingly fast even for large datasets. This is the first research paper that develops a model to detect cyber attacks using security analytics; hence this research builds a foundation on which to expand upon for future research in this subject area
- …