5 research outputs found

    A MULTI-FUNCTIONAL PROVENANCE ARCHITECTURE: CHALLENGES AND SOLUTIONS

    Get PDF
    In service-oriented environments, services are put together in the form of a workflow with the aim of distributed problem solving. Capturing the execution details of the services' transformations is a significant advantage of using workflows. These execution details, referred to as provenance information, are usually traced automatically and stored in provenance stores. Provenance data contains the data recorded by a workflow engine during a workflow execution. It identifies what data is passed between services, which services are involved, and how results are eventually generated for particular sets of input values. Provenance information is of great importance and has found its way through areas in computer science such as: Bioinformatics, database, social, sensor networks, etc. Current exploitation and application of provenance data is very limited as provenance systems started being developed for specific applications. Thus, applying learning and knowledge discovery methods to provenance data can provide rich and useful information on workflows and services. Therefore, in this work, the challenges with workflows and services are studied to discover the possibilities and benefits of providing solutions by using provenance data. A multifunctional architecture is presented which addresses the workflow and service issues by exploiting provenance data. These challenges include workflow composition, abstract workflow selection, refinement, evaluation, and graph model extraction. The specific contribution of the proposed architecture is its novelty in providing a basis for taking advantage of the previous execution details of services and workflows along with artificial intelligence and knowledge management techniques to resolve the major challenges regarding workflows. The presented architecture is application-independent and could be deployed in any area. The requirements for such an architecture along with its building components are discussed. Furthermore, the responsibility of the components, related works and the implementation details of the architecture along with each component are presented

    Process mining : conformance and extension

    Get PDF
    Today’s business processes are realized by a complex sequence of tasks that are performed throughout an organization, often involving people from different departments and multiple IT systems. For example, an insurance company has a process to handle insurance claims for their clients, and a hospital has processes to diagnose and treat patients. Because there are many activities performed by different people throughout the organization, there is a lack of transparency about how exactly these processes are executed. However, understanding the process reality (the "as is" process) is the first necessary step to save cost, increase quality, or ensure compliance. The field of process mining aims to assist in creating process transparency by automatically analyzing processes based on existing IT data. Most processes are supported by IT systems nowadays. For example, Enterprise Resource Planning (ERP) systems such as SAP log all transaction information, and Customer Relationship Management (CRM) systems are used to keep track of all interactions with customers. Process mining techniques use these low-level log data (so-called event logs) to automatically generate process maps that visualize the process reality from different perspectives. For example, it is possible to automatically create process models that describe the causal dependencies between activities in the process. So far, process mining research has mostly focused on the discovery aspect (i.e., the extraction of models from event logs). This dissertation broadens the field of process mining to include the aspect of conformance and extension. Conformance aims at the detection of deviations from documented procedures by comparing the real process (as recorded in the event log) with an existing model that describes the assumed or intended process. Conformance is relevant for two reasons: 1. Most organizations document their processes in some form. For example, process models are created manually to understand and improve the process, comply with regulations, or for certification purposes. In the presence of existing models, it is often more important to point out the deviations from these existing models than to discover completely new models. Discrepancies emerge because business processes change, or because the models did not accurately reflect the real process in the first place (due to the manual and subjective creation of these models). If the existing models do not correspond to the actual processes, then they have little value. 2. Automatically discovered process models typically do not completely "fit" the event logs from which they were created. These discrepancies are due to noise and/or limitations of the used discovery techniques. Furthermore, in the context of complex and diverse process environments the discovered models often need to be simplified to obtain useful insights. Therefore, it is crucial to be able to check how much a discovered process model actually represents the real process. Conformance techniques can be used to quantify the representativeness of a mined model before drawing further conclusions. They thus constitute an important quality measurement to effectively use process discovery techniques in a practical setting. Once one is confident in the quality of an existing or discovered model, extension aims at the enrichment of these models by the integration of additional characteristics such as time, cost, or resource utilization. By extracting aditional information from an event log and projecting it onto an existing model, bottlenecks can be highlighted and correlations with other process perspectives can be identified. Such an integrated view on the process is needed to understand root causes for potential problems and actually make process improvements. Furthermore, extension techniques can be used to create integrated simulation models from event logs that resemble the real process more closely than manually created simulation models. In Part II of this thesis, we provide a comprehensive framework for the conformance checking of process models. First, we identify the evaluation dimensions fitness, decision/generalization, and structure as the relevant conformance dimensions.We develop several Petri-net based approaches to measure conformance in these dimensions and describe five case studies in which we successfully applied these conformance checking techniques to real and artificial examples. Furthermore, we provide a detailed literature review of related conformance measurement approaches (Chapter 4). Then, we study existing model evaluation approaches from the field of data mining. We develop three data mining-inspired evaluation approaches for discovered process models, one based on Cross Validation (CV), one based on the Minimal Description Length (MDL) principle, and one using methods based on Hidden Markov Models (HMMs). We conclude that process model evaluation faces similar yet different challenges compared to traditional data mining. Additional challenges emerge from the sequential nature of the data and the higher-level process models, which include concurrent dynamic behavior (Chapter 5). Finally, we point out current shortcomings and identify general challenges for conformance checking techniques. These challenges relate to the applicability of the conformance metric, the metric quality, and the bridging of different process modeling languages. We develop a flexible, language-independent conformance checking approach that provides a starting point to effectively address these challenges (Chapter 6). In Part III, we develop a concrete extension approach, provide a general model for process extensions, and apply our approach for the creation of simulation models. First, we develop a Petri-net based decision mining approach that aims at the discovery of decision rules at process choice points based on data attributes in the event log. While we leverage classification techniques from the data mining domain to actually infer the rules, we identify the challenges that relate to the initial formulation of the learning problem from a process perspective. We develop a simple approach to partially overcome these challenges, and we apply it in a case study (Chapter 7). Then, we develop a general model for process extensions to create integrated models including process, data, time, and resource perspective.We develop a concrete representation based on Coloured Petri-nets (CPNs) to implement and deploy this model for simulation purposes (Chapter 8). Finally, we evaluate the quality of automatically discovered simulation models in two case studies and extend our approach to allow for operational decision making by incorporating the current process state as a non-empty starting point in the simulation (Chapter 9). Chapter 10 concludes this thesis with a detailed summary of the contributions and a list of limitations and future challenges. The work presented in this dissertation is supported and accompanied by concrete implementations, which have been integrated in the ProM and ProMimport frameworks. Appendix A provides a comprehensive overview about the functionality of the developed software. The results presented in this dissertation have been presented in more than twenty peer-reviewed scientific publications, including several high-quality journals

    Mining Workflow Recovery from Event Based Logs

    No full text
    Abstract. Handling workflow transactional behavior remains a main problem to ensure a correct and reliable execution. It is obvious that the discovery, and the explanation of this behavior, would enable to better understand and control workflow recovery. Unfortunately, previous workflow mining works have concentrated their efforts on control flow aspects. Although powerful, these proposals are found lacking in functionalities and performance when used to discover workflow transactional behavior. In this paper, we describe mining techniques, which are able to discover a workflow model, and to improve its transactional behavior from event based logs. First, we propose an algorithm to discover workflow patterns. Then, we propose techniques to discover activities transactional dependencies that allow us to mine workflow recovery techniques. Finally, based on this mining step, we use a set of rules to improve workflow design

    A genetic programming based business process mining approach

    Get PDF
    As business processes become ever more complex there is a need for companies to understand the processes they already have in place. To undertake this manually would be time consuming. The practice of process mining attempts to automatically construct the correct representation of a process based on a set of process execution logs. The aim of this research is to develop a genetic programming based approach for business process mining. The focus of this research is on automated/semi automated business processes within the service industry (by semi automated it is meant that part of the process is manual and likely to be paper based). This is the first time a GP approach has been used in the practice of process mining. The graph based representation and fitness parsing used are also unique to the GP approach. A literature review and an industry survey have been undertaken as part of this research to establish the state-of-the-art in the research and practice of business process modelling and mining. It is observed that process execution logs exist in most service sector companies are not utilised for process mining. The development of a new GP approach is documented along with a set of modifications required to enable accuracy in the mining of complex process constructs, semantics and noisy process execution logs. In the context of process mining accuracy refers to the ability of the mined model to reflect the contents of the event log on which it is based; neither over describing, including features that are not recorded in the log, or under describing, just including the most common features leaving out low frequency task edges, the contents of the event log. The complexity of processes, in terms of this thesis, involves the mining of parallel constructs, processes containing complex semantic constructs (And/XOR split and join points) and processes containing 20 or more tasks. The level of noise mined by the business process mining approach includes event logs which have a small number of randomly selected tasks missing from a third of their structure. A novel graph representation for use with GP in the mining of business processes is presented along with a new way of parsing graph based individuals against process execution logs. The GP process mining approach has been validated with a range of tests drawn from literature and two case studies, provided by the industrial sponsor, utilising live process data. These tests and case studies provide a range of process constructs to fully test and stretch the GP process mining approach. An outlook is given into the future development of the GP process mining approach and process mining as a practice.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    corecore