14 research outputs found

    A recommender system for process discovery

    Get PDF
    Over the last decade, several algorithms for process discovery and process conformance have been proposed. Still, it is well-accepted that there is no dominant algorithm in any of these two disciplines, and then it is often difficult to apply them successfully. Most of these algorithms need a close-to expert knowledge in order to be applied satisfactorily. In this paper, we present a recommender system that uses portfolio-based algorithm selection strategies to face the following problems: to find the best discovery algorithm for the data at hand, and to allow bridging the gap between general users and process mining algorithms. Experiments performed with the developed tool witness the usefulness of the approach for a variety of instances.Peer ReviewedPostprint (author’s final draft

    Conformance checking using activity and trace embeddings

    Get PDF
    Conformance checking describes process mining techniques used to compare an event log and a corresponding process model. In this paper, we propose an entirely new approach to conformance checking based on neural network-based embeddings. These embeddings are vector representations of every activity/task present in the model and log, obtained via act2vec, a Word2vec based model. Our novel conformance checking approach applies the Word Mover’s Distance to the activity embeddings of traces in order to measure fitness and precision. In addition, we investigate a more efficiently calculated lower bound of the former metric, i.e. the Iterative Constrained Transfers measure. An alternative method using trace2vec, a Doc2vec based model, to train and compare vector representations of the process instances themselves is also introduced. These methods are tested in different settings and compared to other conformance checking techniques, showing promising results

    A method for assessing parameter impact on control-flow discovery algorithms

    Get PDF
    Given an event log L, a control-flow discovery algorithm f, and a quality metric m, this paper faces the following problem: what are the parameters in f that mostly influence its application in terms of m when applied to L? This paper proposes a method to solve this problem, based on sensitivity analysis, a theory which has been successfully applied in other areas. Clearly, a satisfactory solution to this problem will be crucial to bridge the gap between process discovery algorithms and final users. Additionally, recommendation techniques and meta-techniques like determining the representational bias of an algorithm may benefit from solutions to the problem considered in this paper. The method has been evaluated over a set of logs and the flexible heuristic miner, and the preliminary results witness the applicability of the general framework described in this paper.Peer ReviewedPostprint (author's final draft

    Conformance Checking of Mixed-paradigm Process Models

    Get PDF
    Mixed-paradigm process models integrate strengths of procedural and declarative representations like Petri nets and Declare. They are specifically interesting for process mining because they allow capturing complex behaviour in a compact way. A key research challenge for the proliferation of mixed-paradigm models for process mining is the lack of corresponding conformance checking techniques. In this paper, we address this problem by devising the first approach that works with intertwined state spaces of mixed-paradigm models. More specifically, our approach uses an alignment-based replay to explore the state space and compute trace fitness in a procedural way. In every state, the declarative constraints are separately updated, such that violations disable the corresponding activities. Our technique provides for an efficient replay towards an optimal alignment by respecting all orthogonal Declare constraints. We have implemented our technique in ProM and demonstrate its performance in an evaluation with real-world event logs.Comment: Accepted for publication in Information System

    Applying Process Mining Algorithms in the Context of Data Collection Scenarios

    Get PDF
    Despite the technological progress, paper-based questionnaires are still widely used to collect data in many application domains like education, healthcare or psychology. To facilitate the enormous amount of work involved in collecting, evaluating and analyzing this data, a system enabling process-driven data collection was developed. Based on generic tools, a process-driven approach for creating, processing and analyzing questionnaires was realized, in which a questionnaire is defined in terms of a process model. Due to this characteristic, process mining algorithms may be applied to event logs created during the execution of questionnaires. Moreover, new data that might not have been used in the context of questionnaires before may be collected and analyzed to provide new insights in regard to both the participant and the questionnaire. This thesis shows that process mining algorithms may be applied successfully to process-oriented questionnaires. Algorithms from the three process mining forms of process discovery, conformance checking and enhancement are applied and used for various analysis. The analysis of certain properties of discovered process models leads to new ways of generating information from questionnaires. Different techniques for conformance checking and their applicability in the context of questionnaires are evaluated. Furthermore, new data that cannot be collected from paper-based questionnaires is used to enhance questionnaires to reveal new and meaningful relationships

    Monotone Precision and Recall Measures for Comparing Executions and Specifications of Dynamic Systems

    Get PDF
    The behavioural comparison of systems is an important concern of software engineering research. For example, the areas of specification discovery and specification mining are concerned with measuring the consistency between a collection of execution traces and a program specification. This problem is also tackled in process mining with the help of measures that describe the quality of a process specification automatically discovered from execution logs. Though various measures have been proposed, it was recently demonstrated that they neither fulfil essential properties, such as monotonicity, nor can they handle infinite behaviour. In this paper, we address this research problem by introducing a new framework for the definition of behavioural quotients. We proof that corresponding quotients guarantee desired properties that existing measures have failed to support. We demonstrate the application of the quotients for capturing precision and recall measures between a collection of recorded executions and a system specification. We use a prototypical implementation of these measures to contrast their monotonic assessment with measures that have been defined in prior research

    Täpne ja tõhus protsessimudelite automaatne koostamine sündmuslogidest

    Get PDF
    Töötajate igapäevatöö koosneb tegevustest, mille eesmärgiks on teenuste pakkumine või toodete valmistamine. Selliste tegevuste terviklikku jada nimetatakse protsessiks. Protsessi kvaliteet ja efektiivsus mõjutab otseselt kliendi kogemust – tema arvamust ja hinnangut teenusele või tootele. Kliendi kogemus on eduka ettevõtte arendamise oluline tegur, mis paneb ettevõtteid järjest rohkem pöörama tähelepanu oma protsesside kirjeldamisele, analüüsimisele ja parendamisele. Protsesside kirjeldamisel kasutatakse tavaliselt visuaalseid vahendeid, sellisel kujul koostatud kirjeldust nimetatakse protsessimudeliks. Kuna mudeli koostaja ei suuda panna kirja kõike erandeid, mis võivad reaalses protsessis esineda, siis ei ole need mudelid paljudel juhtudel terviklikud. Samuti on probleemiks suur töömaht - inimese ajakulu protsessimudeli koostamisel on suur. Protsessimudelite automaatne koostamine (protsessituvastus) võimaldab genereerida protsessimudeli toetudes tegevustega seotud andmetele. Protsessituvastus aitab meil vähendada protsessimudeli loomisele kuluvat aega ja samuti on tulemusena tekkiv mudel (võrreldes käsitsi tehtud mudeliga) kvaliteetsem. Protsessituvastuse tulemusel loodud mudeli kvaliteet sõltub nii algandmete kvaliteedist kui ka protsessituvastuse algoritmist. Antud doktoritöös anname ülevaate erinevatest protsessituvastuse algoritmidest. Toome välja puudused ja pakume välja uue algoritmi Split Miner. Võrreldes olemasolevate algoritmidega on Splint Miner kiirem ja annab tulemuseks kvaliteetsema protsessimudeli. Samuti pakume välja uue lähenemise automaatselt koostatud protsessimudeli korrektsuse hindamiseks, mis on võrreldes olemasolevate meetoditega usaldusväärsem. Doktoritöö näitab, kuidas kasutada optimiseerimise algoritme protsessimudeli korrektsuse suurendamiseks.Everyday, companies’ employees perform activities with the goal of providing services (or products) to their customers. A sequence of such activities is known as business process. The quality and the efficiency of a business process directly influence the customer experience. In a competitive business environment, achieving a great customer experience is fundamental to be a successful company. For this reason, companies are interested in identifying their business processes to analyse and improve them. To analyse and improve a business process, it is generally useful to first write it down in the form of a graphical representation, namely a business process model. Drawing such process models manually is time-consuming because of the time it takes to collect detailed information about the execution of the process. Also, manually drawn process models are often incomplete because it is difficult to uncover every possible execution path in the process via manual data collection. Automated process discovery allows business analysts to exploit process' execution data to automatically discover process models. Discovering high-quality process models is extremely important to reduce the time spent enhancing them and to avoid mistakes during process analysis. The quality of an automatically discovered process model depends on both the input data and the automated process discovery application that is used. In this thesis, we provide an overview of the available algorithms to perform automated process discovery. We identify deficiencies in existing algorithms, and we propose a new algorithm, called Split Miner, which is faster and consistently discovers more accurate process models than existing algorithms. We also propose a new approach to measure the accuracy of automatically discovered process models in a fine-grained manner, and we use this new measurement approach to optimize the accuracy of automatically discovered process models.https://www.ester.ee/record=b530061

    Algoritma Time-based Alpha Miner untuk Memodelkan Proses Bisnis dan Pengoptimasian Menggunakan Sistem Manufaktur Fleksibel di Terminal Petikemas

    Get PDF
    Manajemen proses bisnis dilakukan untuk mendapatkan hasil yang diinginkan dalam waktu dan biaya yang optimum. Waktu yang optimum dapat dicapai dengan menambah sumber daya. Sedangkan, biaya yang optimum dapat dicapai dengan mengurangi sumber daya. Oleh karena itu, diperlukan sistem yang dapat mengoptimasi proses bisnis. Namun, optimasi hanya dapat bekerja apabila ada data model proses bisnis. Sehingga juga diperlukan sistem yang dapat mengenali proses bisnis. Proses bisnis bisa diperoleh dengan menggunakan teknik process discovery yang bekerja dengan menggali relasi dari data log. Relasi tersebut adalah sequence, paralel (XOR, OR, dan AND). Process discovery yang sudah ada dapat menggali relasi paralel (XOR, OR, dan AND), sequence, perulangan (loop), non-free choice, dan invisible task. Selain itu, sebagian besar process discovery yang sudah ada juga menggunakan single timestamp untuk menemukan model proses. Di dalam penelitian yang telah dilakukan oleh peneliti-peneliti sebelumnya, aktivitas dapat diparalelkan dengan cara mengetahui hubungan resiprokal (misal: aktivitas A berelasi sequence kepada aktivitas B dan sebaliknya) antaraktivitas untuk menggali relasi paralel dalam data log. Misal trace AB, BA untuk paralel antara aktivitas A dan B. Namun, tidak dikaitkan dengan kondisi yang dibutuhkan untuk memparalelkan kedua aktivitas di data log. Untuk memparalelkan sebuah proses bisnis, aktivitas yang independen harus diidentifikasi terlebih dahulu, seperti aktivitas manakah dari proses bisnis yang dapat dilakukan bersamaan. Tingkat paralelisme tertinggi dicapai jika jumlah aktivitas yang diidentifikasi sebagai independen dapat dimaksimalkan. Secara umum, identifikasi ini didasarkan pada waktu dan tempat eksekusi aktivitas, aktivitas dapat di paralelisasi jika aktivitas pada entitas simulasi yang sama dijalankan dalam urutan timestamp. Untuk meningkatkan tingkat paralelisme, kami mengusulkan sebuah pendekatan baru yang mengidentifikasi kriteria independensi lain: Jika dua aktivitas pada entitas simulasi yang sama mengakses item data yang sama dengan cara yang berbeda, mereka dapat dieksekusi secara paralel. Pada Tesis ini akan diusulkan kondisi yang diperlukan untuk memparalelkan dua kegiatan di dalam data log dan mengembangkan sistem yang dapat memodelkan relasi proses bisnis secara otomatis dan dapat mengoptimasi biaya dan waktu sekaligus. Optimasi biaya dan waktu dilakukan dengan mempertimbangkan sumber daya (machine) dengan menggunakan Sistem Manufaktur Fleksibel (FMS) untuk memperoleh jumlah total mesin dapat berganti fungsi dalam satu hari dan satu bulan serta Goal Programming yang digunakan untuk mengoptimasi biaya dan waktu dari setiap departemen sehingga menghasilkan nilai waktu dan biaya yang optimum. Hasil eksperimen menunjukkan bahwa algoritma Modified Time-based Alpha Miner dapat menemukan model proses dengan benar serta relasi paralel AND, OR dan XOR, sementara algoritma original Alpha hanya dapat mengkategorikan relasi paralel menjadi AND dan XOR. Setelah model proses ditemukan, dengan menggunakan Sistem Manufaktur Fleksibel, Departemen Behandle dan Karantina dapat dieksekusi paralel dengan merubah fungsi mesin RTGC menjadi HT Truck. Dengan menggunakan kakas bantu LEKIN dan metode First Come First Serve (FCFS), hasil penjadwalan di Terminal Petikemas dapat diketahui berdasarkan jenis containernya. Lalu, hasil optimasi dengan Goal programming adalah waktu maksimum setiap aktivitas yang diminimalkan menjadi rata-rata durasi eksekusi per aktivitas dan biaya yang dapat dihemat dari waktu maksimum tersebut. ====================================================================================== Bussiness process management is used to produce product within optimized time and cost. The optimized time can be achieved by adding resource; whereas, the optimized cost can be achieved by decreasing resource. Therefore, it is needed a system to optimize business process. However, the optimization only works if business process model available. Hence, it is also needed a system to discover business process. Business process can be obtained by using process discovery technique that works by mining relation from event log. The relation which can be obtained is sequence and parallel (XOR, OR, and AND). Existing process discovery can discover the parallel relations (XOR, OR, and AND), sequence, loop, non-free choice, and invisible task. In addition, most existing discovery processes use single timestamp to discover the process model. In previous study done by researchers, activity can be paralleled by knowing the reciprocal relationship (e.g. activity A has sequence relation to activity B and vice versa) to mine the parallel relation in the event log, e.g. trace AB, BA to parallel between activity A and B. However, it does not have necessary condition to parallelize the two activities in the event log. To parallelize simulations of business process, independent activities have to be identified, which can be executed concurrently. The highest level of parallelism is achieved if the number of activities identified as independent is maximized. Traditionally, this identification is based on time and location of activities, only allowing parallelization if activities on the same simulation entity are executed in timestamp order. To increase the level of parallelism, we propose a novel approach investigating another criterion for independence: If two activities on the same simulation entity do not access the same data items in a conflicting manner, they can as well be executed in parallel. In this Thesis research, we will propose the necessary conditions to parallelize two activities in the event log and develop a system that can model business processes automatically and can optimize the cost and time at once. Cost and time optimization is done by considering the resources (machine) by using Flexible Manufacturing System (FMS) to obtain the total number of machines can change the function in one day and one month and Goal Programming which is used to optimize cost and time of each department so as to generate value optimum time and cost. The experimental results show that Modified Time-based Alpha Miner algorithm can find process models correctly and gateway parallel AND, OR and XOR, while the original Alpha Miner algorithm can only categorize parallel relations into AND and XOR. Once the modeling process is found, using the Flexible Manufacturing System, the Behandle and Quarantine departments can be executed parallel by changing the function of the RTGC engine to HT Truck. By using LEKIN tools and First Come First Serve (FCFS) method, the scheduling result in container terminal can be known based on container type. Then, the optimization result with Goal programming is the maximum time each activity is minimized to the average duration of execution per activity and the cost can be saved from that maximum time

    Conformance checking and diagnosis in process mining

    Get PDF
    In the last decades, the capability of information systems to generate and record overwhelming amounts of event data has experimented an exponential growth in several domains, and in particular in industrial scenarios. Devices connected to the internet (internet of things), social interaction, mobile computing, and cloud computing provide new sources of event data and this trend will continue in the next decades. The omnipresence of large amounts of event data stored in logs is an important enabler for process mining, a novel discipline for addressing challenges related to business process management, process modeling, and business intelligence. Process mining techniques can be used to discover, analyze and improve real processes, by extracting models from observed behavior. The capability of these models to represent the reality determines the quality of the results obtained from them, conditioning its usefulness. Conformance checking is the aim of this thesis, where modeled and observed behavior are analyzed to determine if a model defines a faithful representation of the behavior observed a the log. Most of the efforts in conformance checking have focused on measuring and ensuring that models capture all the behavior in the log, i.e., fitness. Other properties, such as ensuring a precise model (not including unnecessary behavior) have been disregarded. The first part of the thesis focuses on analyzing and measuring the precision dimension of conformance, where models describing precisely the reality are preferred to overly general models. The thesis includes a novel technique based on detecting escaping arcs, i.e., points where the modeled behavior deviates from the one reflected in log. The detected escaping arcs are used to determine, in terms of a metric, the precision between log and model, and to locate possible actuation points in order to achieve a more precise model. The thesis also presents a confidence interval on the provided precision metric, and a multi-factor measure to assess the severity of the detected imprecisions. Checking conformance can be time consuming for real-life scenarios, and understanding the reasons behind the conformance mismatches can be an effort-demanding task. The second part of the thesis changes the focus from the precision dimension to the fitness dimension, and proposes the use of decomposed techniques in order to aid in checking and diagnosing fitness. The proposed approach is based on decomposing the model into single entry single exit components. The resulting fragments represent subprocesses within the main process with a simple interface with the rest of the model. Fitness checking per component provides well-localized conformance information, aiding on the diagnosis of the causes behind the problems. Moreover, the relations between components can be exploded to improve the diagnosis capabilities of the analysis, identifying areas with a high degree of mismatches, or providing a hierarchy for a zoom-in zoom-out analysis. Finally, the thesis proposed two main applications of the decomposed approach. First, the theory proposed is extended to incorporate data information for fitness checking in a decomposed manner. Second, a real-time event-based framework is presented for monitoring fitness.En las últimas décadas, la capacidad de los sistemas de información para generar y almacenar datos de eventos ha experimentado un crecimiento exponencial, especialmente en contextos como el industrial. Dispositivos conectados permanentemente a Internet (Internet of things), redes sociales, teléfonos inteligentes, y la computación en la nube proporcionan nuevas fuentes de datos, una tendencia que continuará en los siguientes años. La omnipresencia de grandes volúmenes de datos de eventos almacenados en logs abre la puerta al Process Mining (Minería de Procesos), una nueva disciplina a caballo entre las técnicas de gestión de procesos de negocio, el modelado de procesos, y la inteligencia de negocio. Las técnicas de minería de procesos pueden usarse para descubrir, analizar, y mejorar procesos reales, a base de extraer modelos a partir del comportamiento observado. La capacidad de estos modelos para representar la realidad determina la calidad de los resultados que se obtengan, condicionando su efectividad. El Conformance Checking (Verificación de Conformidad), objetivo final de esta tesis, permite analizar los comportamientos observados y modelados, y determinar si el modelo es una fiel representación de la realidad. La mayoría de los esfuerzos en Conformance Checking se han centrado en medir y asegurar que los modelos fueran capaces de capturar todo el comportamiento observado, también llamado "fitness". Otras propiedades, tales como asegurar la "precisión" de los modelos (no modelar comportamiento innecesario) han sido relegados a un segundo plano. La primera parte de esta tesis se centra en analizar la precisión, donde modelos describiendo la realidad con precisión son preferidos a modelos demasiado genéricos. La tesis presenta una nueva técnica basada en detectar "arcos de escape", i.e. puntos donde el comportamiento modelado se desvía del comportamiento reflejado en el log. Estos arcos de escape son usados para determinar, en forma de métrica, el nivel de precisión entre un log y un modelo, y para localizar posibles puntos de mejora. La tesis también presenta un intervalo de confianza sobre la métrica, así como una métrica multi-factorial para medir la severidad de las imprecisiones detectadas. Conformance Checking puede ser una operación costosa para escenarios reales, y entender las razones que causan los problemas requiere esfuerzo. La segunda parte de la tesis cambia el foco (de precisión a fitness), y propone el uso de técnicas de descomposición para ayudar en la verificación de fitness. Las técnicas propuestas se basan en descomponer el modelo en componentes con una sola entrada y una sola salida, llamados SESEs. Estos componentes representan subprocesos dentro del proceso principal. Verificar el fitness a nivel de subproceso proporciona una información detallada de dónde están los problemas, ayudando en su diagnóstico. Además, las relaciones entre subprocesos pueden ser explotadas para mejorar las capacidades de diagnóstico e identificar qué áreas concentran la mayor densidad de problemas. Finalmente, la tesis propone dos aplicaciones directas de las técnicas de descomposición: 1) la teoría es extendida para incluir información de datos a la verificación de fitness, y 2) el uso de sistemas descompuestos en tiempo real para monitorizar fitnes

    Aligning observed and modeled behavior

    Get PDF
    corecore