606 research outputs found
Enhancing precision in process conformance: stability, confidence and severity
Process Conformance is becoming a crucial area due to the changing nature of processes within an Information System. By confronting specifications against system executions (the main problem tackled in process conformance), both system bugs and obsolete/incorrect specifications can be revealed. This paper presents novel techniques to enrich the process conformance analysis for the precision dimension. The new features of the metric proposed in this paper provides a complete view of the precision between a log and a model. The techniques have been implemented as a plug-in in an open-source Process Mining platform and experimental results witnessing both the theory and the goals of this work are presented.Postprint (published version
All That Glitters Is Not Gold: Towards Process Discovery Techniques with Guarantees
The aim of a process discovery algorithm is to construct from event data a
process model that describes the underlying, real-world process well.
Intuitively, the better the quality of the event data, the better the quality
of the model that is discovered. However, existing process discovery algorithms
do not guarantee this relationship. We demonstrate this by using a range of
quality measures for both event data and discovered process models. This paper
is a call to the community of IS engineers to complement their process
discovery algorithms with properties that relate qualities of their inputs to
those of their outputs. To this end, we distinguish four incremental stages for
the development of such algorithms, along with concrete guidelines for the
formulation of relevant properties and experimental validation. We will also
use these stages to reflect on the state of the art, which shows the need to
move forward in our thinking about algorithmic process discovery.Comment: 13 pages, 4 figures. Submitted to the International Conference on
Advanced Information Systems Engineering, 202
Monotone Precision and Recall Measures for Comparing Executions and Specifications of Dynamic Systems
The behavioural comparison of systems is an important concern of software
engineering research. For example, the areas of specification discovery and
specification mining are concerned with measuring the consistency between a
collection of execution traces and a program specification. This problem is
also tackled in process mining with the help of measures that describe the
quality of a process specification automatically discovered from execution
logs. Though various measures have been proposed, it was recently demonstrated
that they neither fulfil essential properties, such as monotonicity, nor can
they handle infinite behaviour. In this paper, we address this research problem
by introducing a new framework for the definition of behavioural quotients. We
proof that corresponding quotients guarantee desired properties that existing
measures have failed to support. We demonstrate the application of the
quotients for capturing precision and recall measures between a collection of
recorded executions and a system specification. We use a prototypical
implementation of these measures to contrast their monotonic assessment with
measures that have been defined in prior research
An A*-algorithm for computing discounted anti-alignments in process mining
Process mining techniques aim at analyzing and monitoring processes through event data. Formal models like Petri nets serve as an effective representation of the processes. A central question in the field is to assess the conformance of a process model with respect to the real process executions. The notion of anti-alignment, which represents a model run that is as distant as possible to the process executions, has been demonstrated to be crucial to measure precision of models. However, the only known algorithm for computing anti-alignments has a high complexity, which prevents it from being applied on real-life problem instances. In this paper we propose a novel algorithm for computing anti-alignments, based on the well-known graph-based A* scheme. By introducing a discount factor in the edit distance used for the search of anti-alignments, we obtain the first efficient algorithm to approximate them. We show how this approximation is quite accurate in practice, by comparing it with the optimal results for small instances where the optimal algorithm can also compute anti-alignments. Finally, we compare the obtained precision metric with respect to the state-of-the-art metrics in the literature for real-life examples.Peer ReviewedPostprint (author's final draft
The connection between process complexity of event sequences and models discovered by process mining
Process mining is a research area focusing on the design of algorithms that can auto-matically provide insights into business processes. Among the most popular algorithms are those for automated process discovery, which have the ultimate goal to generate a process model that summarizes the behavior recorded in an event log. Past research had the aim to improve process discovery algorithms irrespective of the characteristics of the input log. In this paper, we take a step back and investigate the connection between measures capturing characteristics of the input event log and the quality of the discov-ered process models. To this end, we review the state-of-the-art process complexity measures, propose a new process complexity measure based on graph entropy, and ana-lyze this set of complexity measures on an extensive collection of event logs and corre-sponding automatically discovered process models. Our analysis shows that many process complexity measures correlate with the quality of the discovered process mod-els, demonstrating the potential of using complexity measures as predictors of process model quality. This finding is important for process mining research, as it highlights that not only algorithms, but also connections between input data and output quality should be studied
Improving reference mining in patents with BERT
In this paper we address the challenge of extracting scientific references
from patents. We approach the problem as a sequence labelling task and
investigate the merits of BERT models to the extraction of these long
sequences. References in patents to scientific literature are relevant to study
the connection between science and industry. Most prior work only uses the
front-page citations for this analysis, which are provided in the metadata of
patent archives. In this paper we build on prior work using Conditional Random
Fields (CRF) and Flair for reference extraction. We improve the quality of the
training data and train three BERT-based models on the labelled data (BERT,
bioBERT, sciBERT). We find that the improved training data leads to a large
improvement in the quality of the trained models. In addition, the BERT models
beat CRF and Flair, with recall scores around 97% obtained with cross
validation. With the best model we label a large collection of 33 thousand
patents, extract the citations, and match them to publications in the Web of
Science database. We extract 50% more references than with the old training
data and methods: 735 thousand references in total. With these
patent-publication links, follow-up research will further analyze which types
of scientific work lead to inventions.Comment: 10 pages, 3 figure
Intelligent data leak detection through behavioural analysis
In this paper we discuss a solution to detect data leaks in an intelligent and furtive way through a real time analysis of the user’s behaviour while handling classified information. Data is based on experiences with real world use cases and a variety of data preparation and data analysis techniques have been tried. Results show the feasibility of the approach, but also the necessity to correlate with other security events to improve the precision.UID/CEC/00319/201
- …