7 research outputs found

    Digital Health Data Imperfection Patterns and Their Manifestations in an Australian Digital Hospital

    Get PDF
    Whilst digital health data provides great benefits for improved and effective patient care and organisational outcomes, the quality of digital health data can sometimes be a significant issue. Healthcare providers are known to spend a significant amount of time on assessing and cleaning data. To address this situation, this paper presents six Digital Health Data Imperfection Patterns that provide insight into data quality issues of digital health data, their root causes, their impact, and how these can be detected. Using the CRISP-DM methodology, we demonstrate the utility and pervasiveness of the patterns at the emergency department of Australia's major tertiary digital hospital. The pattern collection can be used by health providers to identify and prevent key digital health data quality issues contributing to reliable insights for clinical decision making and patient care delivery. The patterns also provide a solid foundation for future research in digital health through its identification of key data quality issues, root causes, detection techniques, and terminology

    The quality guardian: Improving activity label quality in event logs through gamification

    No full text
    Data cleaning, the most tedious task of data analysis, can turn into a fun experience when performed through a game. This thesis shows that the use of gamification and crowdsourcing techniques can mitigate the problem of poor quality of process data. The Quality Guardian, a family of gamified systems, is proposed, which exploits the motivational drives of domain experts to engage with the detection and repair of imperfect activity labels in process data. Evaluation of the developed games using real-life data sets and domain experts shows quality improvement as well as a positive user experience

    Collaborative and interactive detection and repair of activity labels in process event logs

    No full text
    Process mining uses computational techniques for process-oriented data analysis. The use of poor quality input data will lead to unreliable analysis outcomes (garbage in - garbage out), as it does for other types of data analysis. Among the key inputs to process mining analyses are activity labels in event logs which represent tasks that have been performed. Activity labels are not immune from data quality issues. Fixing them is an important but challenging endeavour, which may require domain knowledge and can be computationally expensive. In this paper we propose to tackle this challenge from a novel angle by using a gamified crowdsourcing approach to the detection and repair of problematic activity labels, namely those with identical semantics but different syntax. Evaluation of the prototype with users and a real-life log showed promising results in terms of quality improvements achieved

    A contextual approach to detecting synonymous and polluted activity labels in process event logs

    No full text
    Process mining, as a well-established research area, uses algorithms for process-oriented data analysis. Similar to other types of data analysis, the existence of quality issues in input data will lead to unreliable analysis results (garbage in - garbage out). An important input for process mining is an event log which is a record of events related to a business process as it is performed through the use of an information system. While addressing quality issues in event logs is necessary, it is usually an ad-hoc and tiresome task. In this paper, we propose an automatic approach for detecting two types of data quality issues related to activities, both critical for the success of process mining studies: synonymous labels (same semantics with different syntax) and polluted labels (same semantics and same label structures). We propose the use of activity context, i.e. control flow, resource, time, and data attributes to detect semantically identical activity labels. We have implemented our approach and validated it using real-life logs from two hospitals and an insurance company, and have achieved promising results in detecting frequent imperfect activity labels

    Process Activity Ontology Learning From Event Logs Through Gamification

    No full text
    Process mining is concerned with deriving knowledge from process data as recorded in so-called event logs. The quality of event logs is a constraining factor in achieving reliable insights. Particular quality problems are posed by activity labels which are meant to be representative of organisational activities, but may take different manifestations (e.g. as a result of manual entry synonyms may be introduced). Ideally, such problems are remedied by domain experts, but they are time-poor and data cleaning is a time-consuming and tedious task. Ontologies provide a means to formalise domain knowledge and their use can provide a scalable solution to fixing activity label similarity problems, as they can be extended and reused over time. Existing approaches to activity label quality improvement use manually-generated ontologies or ontologies that are too general (e.g. WordNet). Limited attention has been paid to facilitating the development of purposeful ontologies in the field of process mining. This paper is concerned with the creation of activity ontologies by domain experts. For the first time in the field of process mining, their participation is facilitated and motivated through the application of techniques from crowdsourcing and gamification. Evaluation of our approach to the construction of activity ontologies by 35 participants shows that they found the method engaging and that its application results in high-quality ontologies

    Digital Health Data Imperfection Patterns and Their Manifestations in an Australian Digital Hospital

    No full text
    Whilst digital health data provides great benefits for improved and effective patient care and organisational outcomes, the quality of digital health data can sometimes be a significant issue. Healthcare providers are known to spend a significant amount of time on assessing and cleaning data. To address this situation, this paper presents six Digital Health Data Imperfection Patterns that provide insight into data quality issues of digital health data, their root causes, their impact, and how these can be detected. Using the CRISP-DM methodology, we demonstrate the utility and pervasiveness of the patterns at the emergency department of Australia's major tertiary digital hospital. The pattern collection can be used by health providers to identify and prevent key digital health data quality issues contributing to reliable insights for clinical decision making and patient care delivery. The patterns also provide a solid foundation for future research in digital health through its identification of key data quality issues, root causes, detection techniques, and terminology

    Digital Health Data Quality Issues: Systematic Review

    No full text
    Background:The promise of digital health is principally dependent on the ability to electronically capture data that can be analyzed to improve decision-making. However, the ability to effectively harness data has proven elusive, largely because of the quality of the data captured. Despite the importance of data quality (DQ), an agreed-upon DQ taxonomy evades literature. When consolidated frameworks are developed, the dimensions are often fragmented, without consideration of the interrelationships among the dimensions or their resultant impact.Objective:The aim of this study was to develop a consolidated digital health DQ dimension and outcome (DQ-DO) framework to provide insights into 3 research questions: What are the dimensions of digital health DQ? How are the dimensions of digital health DQ related? and What are the impacts of digital health DQ?Methods:Following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, a developmental systematic literature review was conducted of peer-reviewed literature focusing on digital health DQ in predominately hospital settings. A total of 227 relevant articles were retrieved and inductively analyzed to identify digital health DQ dimensions and outcomes. The inductive analysis was performed through open coding, constant comparison, and card sorting with subject matter experts to identify digital health DQ dimensions and digital health DQ outcomes. Subsequently, a computer-assisted analysis was performed and verified by DQ experts to identify the interrelationships among the DQ dimensions and relationships between DQ dimensions and outcomes. The analysis resulted in the development of the DQ-DO framework.Results:The digital health DQ-DO framework consists of 6 dimensions of DQ, namely accessibility, accuracy, completeness, consistency, contextual validity, and currency; interrelationships among the dimensions of digital health DQ, with consistency being the most influential dimension impacting all other digital health DQ dimensions; 5 digital health DQ outcomes, namely clinical, clinician, research-related, business process, and organizational outcomes; and relationships between the digital health DQ dimensions and DQ outcomes, with the consistency and accessibility dimensions impacting all DQ outcomes.Conclusions:The DQ-DO framework developed in this study demonstrates the complexity of digital health DQ and the necessity for reducing digital health DQ issues. The framework further provides health care executives with holistic insights into DQ issues and resultant outcomes, which can help them prioritize which DQ-related problems to tackle first
    corecore