1 research outputs found

    A contextual approach to detecting synonymous and polluted activity labels in process event logs

    No full text
    Process mining, as a well-established research area, uses algorithms for process-oriented data analysis. Similar to other types of data analysis, the existence of quality issues in input data will lead to unreliable analysis results (garbage in - garbage out). An important input for process mining is an event log which is a record of events related to a business process as it is performed through the use of an information system. While addressing quality issues in event logs is necessary, it is usually an ad-hoc and tiresome task. In this paper, we propose an automatic approach for detecting two types of data quality issues related to activities, both critical for the success of process mining studies: synonymous labels (same semantics with different syntax) and polluted labels (same semantics and same label structures). We propose the use of activity context, i.e. control flow, resource, time, and data attributes to detect semantically identical activity labels. We have implemented our approach and validated it using real-life logs from two hospitals and an insurance company, and have achieved promising results in detecting frequent imperfect activity labels
    corecore