606 research outputs found

    Enhancing precision in process conformance: stability, confidence and severity

    Get PDF
    Process Conformance is becoming a crucial area due to the changing nature of processes within an Information System. By confronting specifications against system executions (the main problem tackled in process conformance), both system bugs and obsolete/incorrect specifications can be revealed. This paper presents novel techniques to enrich the process conformance analysis for the precision dimension. The new features of the metric proposed in this paper provides a complete view of the precision between a log and a model. The techniques have been implemented as a plug-in in an open-source Process Mining platform and experimental results witnessing both the theory and the goals of this work are presented.Postprint (published version

    All That Glitters Is Not Gold: Towards Process Discovery Techniques with Guarantees

    Get PDF
    The aim of a process discovery algorithm is to construct from event data a process model that describes the underlying, real-world process well. Intuitively, the better the quality of the event data, the better the quality of the model that is discovered. However, existing process discovery algorithms do not guarantee this relationship. We demonstrate this by using a range of quality measures for both event data and discovered process models. This paper is a call to the community of IS engineers to complement their process discovery algorithms with properties that relate qualities of their inputs to those of their outputs. To this end, we distinguish four incremental stages for the development of such algorithms, along with concrete guidelines for the formulation of relevant properties and experimental validation. We will also use these stages to reflect on the state of the art, which shows the need to move forward in our thinking about algorithmic process discovery.Comment: 13 pages, 4 figures. Submitted to the International Conference on Advanced Information Systems Engineering, 202

    Monotone Precision and Recall Measures for Comparing Executions and Specifications of Dynamic Systems

    Get PDF
    The behavioural comparison of systems is an important concern of software engineering research. For example, the areas of specification discovery and specification mining are concerned with measuring the consistency between a collection of execution traces and a program specification. This problem is also tackled in process mining with the help of measures that describe the quality of a process specification automatically discovered from execution logs. Though various measures have been proposed, it was recently demonstrated that they neither fulfil essential properties, such as monotonicity, nor can they handle infinite behaviour. In this paper, we address this research problem by introducing a new framework for the definition of behavioural quotients. We proof that corresponding quotients guarantee desired properties that existing measures have failed to support. We demonstrate the application of the quotients for capturing precision and recall measures between a collection of recorded executions and a system specification. We use a prototypical implementation of these measures to contrast their monotonic assessment with measures that have been defined in prior research

    An A*-algorithm for computing discounted anti-alignments in process mining

    Get PDF
    Process mining techniques aim at analyzing and monitoring processes through event data. Formal models like Petri nets serve as an effective representation of the processes. A central question in the field is to assess the conformance of a process model with respect to the real process executions. The notion of anti-alignment, which represents a model run that is as distant as possible to the process executions, has been demonstrated to be crucial to measure precision of models. However, the only known algorithm for computing anti-alignments has a high complexity, which prevents it from being applied on real-life problem instances. In this paper we propose a novel algorithm for computing anti-alignments, based on the well-known graph-based A* scheme. By introducing a discount factor in the edit distance used for the search of anti-alignments, we obtain the first efficient algorithm to approximate them. We show how this approximation is quite accurate in practice, by comparing it with the optimal results for small instances where the optimal algorithm can also compute anti-alignments. Finally, we compare the obtained precision metric with respect to the state-of-the-art metrics in the literature for real-life examples.Peer ReviewedPostprint (author's final draft

    The connection between process complexity of event sequences and models discovered by process mining

    Get PDF
    Process mining is a research area focusing on the design of algorithms that can auto-matically provide insights into business processes. Among the most popular algorithms are those for automated process discovery, which have the ultimate goal to generate a process model that summarizes the behavior recorded in an event log. Past research had the aim to improve process discovery algorithms irrespective of the characteristics of the input log. In this paper, we take a step back and investigate the connection between measures capturing characteristics of the input event log and the quality of the discov-ered process models. To this end, we review the state-of-the-art process complexity measures, propose a new process complexity measure based on graph entropy, and ana-lyze this set of complexity measures on an extensive collection of event logs and corre-sponding automatically discovered process models. Our analysis shows that many process complexity measures correlate with the quality of the discovered process mod-els, demonstrating the potential of using complexity measures as predictors of process model quality. This finding is important for process mining research, as it highlights that not only algorithms, but also connections between input data and output quality should be studied

    Improving reference mining in patents with BERT

    Get PDF
    In this paper we address the challenge of extracting scientific references from patents. We approach the problem as a sequence labelling task and investigate the merits of BERT models to the extraction of these long sequences. References in patents to scientific literature are relevant to study the connection between science and industry. Most prior work only uses the front-page citations for this analysis, which are provided in the metadata of patent archives. In this paper we build on prior work using Conditional Random Fields (CRF) and Flair for reference extraction. We improve the quality of the training data and train three BERT-based models on the labelled data (BERT, bioBERT, sciBERT). We find that the improved training data leads to a large improvement in the quality of the trained models. In addition, the BERT models beat CRF and Flair, with recall scores around 97% obtained with cross validation. With the best model we label a large collection of 33 thousand patents, extract the citations, and match them to publications in the Web of Science database. We extract 50% more references than with the old training data and methods: 735 thousand references in total. With these patent-publication links, follow-up research will further analyze which types of scientific work lead to inventions.Comment: 10 pages, 3 figure

    Intelligent data leak detection through behavioural analysis

    Get PDF
    In this paper we discuss a solution to detect data leaks in an intelligent and furtive way through a real time analysis of the user’s behaviour while handling classified information. Data is based on experiences with real world use cases and a variety of data preparation and data analysis techniques have been tried. Results show the feasibility of the approach, but also the necessity to correlate with other security events to improve the precision.UID/CEC/00319/201
    corecore