3 research outputs found

    DataGauge: A Practical Process for Systematically Designing and Implementing Quality Assessments of Repurposed Clinical Data

    Get PDF
    The well-known hazards of repurposing data make Data Quality (DQ) assessment a vital step towards ensuring valid results regardless of analytical methods. However, there is no systematic process to implement DQ assessments for secondary uses of clinical data. This paper presents DataGauge, a systematic process for designing and implementing DQ assessments to evaluate repurposed data for a specific secondary use. DataGauge is composed of five steps: (1) Define information needs, (2) Develop a formal Data Needs Model (DNM), (3) Use the DNM and DQ theory to develop goal-specific DQ assessment requirements, (4) Extract DNM-specified data, and (5) Evaluate according to DQ requirements. DataGauge\u27s main contribution is integrating general DQ theory and DQ assessment methods into a systematic process. This process supports the integration and practical implementation of existing Electronic Health Record-specific DQ assessment guidelines. DataGauge also provides an initial theory-based guidance framework that ties the DNM to DQ testing methods for each DQ dimension to aid the design of DQ assessments. This framework can be augmented with existing DQ guidelines to enable systematic assessment. DataGauge sets the stage for future systematic DQ assessment research by defining an assessment process, capable of adapting to a broad range of clinical datasets and secondary uses. Defining DataGauge sets the stage for new research directions such as DQ theory integration, DQ requirements portability research, DQ assessment tool development and DQ assessment tool usability

    Simplification of UML/OCL schemas for efficient reasoning

    Get PDF
    Ensuring the correctness of a conceptual schema is an essential task in order to avoid the propagation of errors during software development. The kind of reasoning required to perform such task is known to be exponential for UML class diagrams alone and even harder when considering OCL constraints. Motivated by this issue, we propose an innovative method aimed at removing constraints and other UML elements of the schema to obtain a simplified one that preserve the same reasoning outcomes. In this way, we can reason about the correctness of the initial artifact by reasoning on a simplified version of it. Thus, the efficiency of the reasoning process is significantly improved. In addition, since our method is independent from the reasoning engine used, any reasoning method may benefit from it.Peer ReviewedPostprint (author's final draft

    DataGauge: A Model-Driven Framework for Systematically Assessing the Quality of Clinical Data for Secondary Use

    Get PDF
    There is growing interest in the reuse of clinical data for research and clinical healthcare quality improvement. However, direct analysis of clinical data sets can yield misleading results. Data Cleaning is often employed as a means to detect and fix data issues during analysis but this approach lacks of systematicity. Data Quality (DQ) assessments are a more thorough way of spotting threats to the validity of analytical results stemming from data repurposing. This is because DQ assessments aim to evaluate ‘fitness for purpose’. However, there is currently no systematic method to assess DQ for the secondary analysis of clinical data. In this dissertation I present DataGauge, a framework to address this gap in the state of the art. I begin by introducing the problem and its general significance to the field of biomedical and clinical informatics (Chapter 1). I then present a literature review that surveys current methods for the DQ assessment of repurposed clinical data and derive the features required to advance the state of the art (Chapter 2). In chapter 3 I present DataGauge, a model-driven framework for systematically assessing the quality of repurposed clinical data, which addresses current limitations in the state of the art. Chapter 4 describes the development of a guidance framework to ensure the systematicity of DQ assessment design. I then evaluate DataGauge’s ability to flag potential DQ issues in comparison to a systematic state of the art method. DataGauge was able to increase ten fold the number of potential DQ issues found over the systematic state of the art method. It identified more specific issues that were a direct threat to fitness for purpose, but also provided broader coverage of the clinical data types and knowledge domains involved in secondary analyses. DataGauge sets the groundwork for systematic and purpose-specific DQ assessments that fully integrate with secondary analysis workflows. It also promotes a team-based approach and the explicit definition of DQ requirements to support communication and transparent reporting of DQ results. Overall, this work provides tools that pave the way to a deeper understanding of repurposed clinical dataset limitations before analysis. It is also a first step towards the automation of purpose-specific DQ assessments for the secondary use of clinical data. Future work will consist of further development of these methods and validating them with research teams making secondary use of clinical data
    corecore