8 research outputs found

    Reductions for Frequency-Based Data Mining Problems

    Full text link
    Studying the computational complexity of problems is one of the - if not the - fundamental questions in computer science. Yet, surprisingly little is known about the computational complexity of many central problems in data mining. In this paper we study frequency-based problems and propose a new type of reduction that allows us to compare the complexities of the maximal frequent pattern mining problems in different domains (e.g. graphs or sequences). Our results extend those of Kimelfeld and Kolaitis [ACM TODS, 2014] to a broader range of data mining problems. Our results show that, by allowing constraints in the pattern space, the complexities of many maximal frequent pattern mining problems collapse. These problems include maximal frequent subgraphs in labelled graphs, maximal frequent itemsets, and maximal frequent subsequences with no repetitions. In addition to theoretical interest, our results might yield more efficient algorithms for the studied problems.Comment: This is an extended version of a paper of the same title to appear in the Proceedings of the 17th IEEE International Conference on Data Mining (ICDM'17

    Discovering Hierarchical Process Models: an Approach Based on Events Clustering

    Full text link
    Process mining is a field of computer science that deals with discovery and analysis of process models based on automatically generated event logs. Currently, many companies use this technology for optimization and improving their processes. However, a discovered process model may be too detailed, sophisticated and difficult for experts to understand. In this paper, we consider the problem of discovering a hierarchical business process model from a low-level event log, i.e., the problem of automatic synthesis of more readable and understandable process models based on information stored in event logs of information systems. Discovery of better structured and more readable process models is intensively studied in the frame of process mining research from different perspectives. In this paper, we present an algorithm for discovering hierarchical process models represented as two-level workflow nets. The algorithm is based on predefined event ilustering so that the cluster defines a sub-process corresponding to a high-level transition at the top level of the net. Unlike existing solutions, our algorithm does not impose restrictions on the process control flow and allows for concurrency and iteration

    Bridging abstraction layers in process mining

    Get PDF
    While the maturity of process mining algorithms increases and more process mining tools enter the market, process mining projects still face the problem of different levels of abstraction when comparing events with modeled business activities. Current approaches for event log abstraction try to abstract from the events in an automated way that does not capture the required domain knowledge to fit business activities. This can lead to misinterpretation of discovered process models. We developed an approach that aims to abstract an event log to the same abstraction level that is needed by the business. We use domain knowledge extracted from existing process documentation to semi-automatically match events and activities. Our abstraction approach is able to deal with n:m relations between events and activities and also supports concurrency. We evaluated our approach in two case studies with a German IT outsourcing company

    Efficient Process Model Discovery Using Maximal Pattern Mining

    Get PDF
    In recent years, process mining has become one of the most important and promising areas of research in the field of business process management as it helps businesses understand, analyze, and improve their business processes. In particular, several proposed techniques and algorithms have been proposed to discover and construct process models from workflow execution logs (i.e., event logs). With the existing techniques, mined models can be built based on analyzing the relationship between any two events seen in event logs. Being restricted by that, they can only handle special cases of routing constructs and often produce unsound models that do not cover all of the traces seen in the log. In this paper, we propose a novel technique for process discovery using Maximal Pattern Mining (MPM) where we construct patterns based on the whole sequence of events seen on the traces—ensuring the soundness of the mined models. Our MPM technique can handle loops (of any length), duplicate tasks, non-free choice constructs, and long distance dependencies. Our evaluation shows that it consistently achieves better precision, replay fitness and efficiency than the existing techniques

    Mining complete, precise and simple process models

    Get PDF
    Process discovery algorithms are generally used to discover the underlying process that has been followed to achieve an objective. In general, these algorithms do not take into account any domain knowledge to derive process models, allowing to apply them in a general manner. However, depending on the selected approach, a different kind of process models can be discovered, as each technique has its strengths and weaknesses, e.g., the expressiveness of the used notation. Hence, it is important to take into account the requirements of the domain when deciding which algorithm to use, as the correct assumptions can lead to richer process models. For instance, among the different domains of application of process mining we can identify several fields that share an interesting requirement about the discovered process models. In security audits, discovered processes have to fulfill strict requisites. This means that the process model should reproduce as much behavior as possible; otherwise some violations may go undetected (replay fitness). On the other hand, in order to avoid false positives, process models should reproduce only the recorded behavior (precision). Finally, process models should be easily readable to better detect deviations (simplicity). Another clear example concerns the educational domain, as in order to be of value for both teachers and learners, a discovered learning process should satisfy the aforementioned requirements. That is, to guarantee feasible and correct evaluations, teachers need to access to all the activities performed by learners, thereby the learning process should be able to reproduce as much behavior as possible (replay fitness). Furthermore, the learning process should focus on the recorded behavior seen in the event log (precision), i.e., show only what the students did, and not what they might have done, while being easily interpretable by the teachers (simplicity). One of the previous requirements is related to the readability of process models: simplicity. In process mining, one of the identified challenges is the appropriate visualization of process models, i.e., to present the results of process discovery in such a way that people actually gain insights about the process. Process models that are unnecessary complex can hinder the real behavior of the process rather than to provide an intuition of what is really happening in an organization. However, achieving a good level of readability is not always straightforward, for instance, due the used representation. Within the different approaches focused to reduce the complexity of a process model, the interest in this PhD Thesis relies on two techniques. On the one hand, to improve the readability of an already discovered process model through the inclusion of duplicate labels. On the other hand, the hierarchization of a process model, i.e., to provide a well known structure to the process model. However, regarding the latter, this technique requires to take into account domain knowledge, as different domains may rely on different requirements when improving the readability of the process model. In other words, in order to improve the interpretability and understandability of a process model, the hierarchization has to be driven by the domain. To sum up, concerning the aim of this PhD Thesis, we can identify two main topics of interest. On the one hand, we are interested in retrieving process models that reproduce as much behavior recorded in the log as possible, without introducing unseen behavior. On the other hand, we try to reduce the complexity of the mined models in order to improve their readability. Hence, the aim of this PhD Thesis is to discover process models considering replay fitness, precision and simplicity, while paying special attention in retrieving highly interpretable process models

    Graph-based Pattern Matching and Discovery for Process-centric Service Architecture Design and Integration

    Get PDF
    Process automation and applications integration initiatives are often complex and involve significant resources in large organisations. The increasing adoption of service-based architectures to solve integration problems and the widely accepted practice of utilising patterns as a medium to reuse design knowledge motivated the definition of this work. In this work a pattern-based framework and techniques providing automation and structure to address the process and application integration problem are proposed. The framework is a layered architecture providing modelling and traceability support to different abstraction layers of the integration problem. To define new services - building blocks of the integration solution - the framework includes techniques to identify process patterns in concrete process models. Graphs and graph morphisms provide a formal basis to represent patterns and their relation to models. A family of graph-based algorithms support automation during matching and discovery of patterns in layered process service models. The framework and techniques are demonstrated in a case study. The algorithms implementing the pattern matching and discovery techniques are investigated through a set of experiments from an empirical evaluation. Observations from conducted interviews to practitioners provide suggestions to enhance the proposed techniques and direct future work regarding analysis tasks in process integration initiatives

    A framework for the analysis and comparison of process mining algorithms

    Get PDF
    Process mining algorithms use event logs to learn and reason about business processes. Although process mining is essentially a machine learning task, little work has been done on systematically analysing algorithms to understand their fundamental properties, such as how much data is needed for confidence in mining. Nor does any rigorous basis exist on which to choose between algorithms and representations, or compare results. We propose a framework for analysing process mining algorithms. Processes are viewed as distributions over traces of activities and mining algorithms as learning these distributions. We use probabilistic automata as a unifying representation to which other representation languages can be converted. To validate the theory we present analyses of the Alpha and Heuristics Miner algorithms under the framework, and two practical applications. We propose a model of noise in process mining and extend the framework to mining from ‘noisy’ event logs. From the probabilities and sub-structures in a model, bounds can be given for the amount of data needed for mining. We also consider mining in non-stationary environments, and a method for recovery of the sequence of changed models over time. We conclude by critically evaluating this framework and suggesting directions for future research
    corecore