3 research outputs found

    DataGauge: A Model-Driven Framework for Systematically Assessing the Quality of Clinical Data for Secondary Use

    Get PDF
    There is growing interest in the reuse of clinical data for research and clinical healthcare quality improvement. However, direct analysis of clinical data sets can yield misleading results. Data Cleaning is often employed as a means to detect and fix data issues during analysis but this approach lacks of systematicity. Data Quality (DQ) assessments are a more thorough way of spotting threats to the validity of analytical results stemming from data repurposing. This is because DQ assessments aim to evaluate ‘fitness for purpose’. However, there is currently no systematic method to assess DQ for the secondary analysis of clinical data. In this dissertation I present DataGauge, a framework to address this gap in the state of the art. I begin by introducing the problem and its general significance to the field of biomedical and clinical informatics (Chapter 1). I then present a literature review that surveys current methods for the DQ assessment of repurposed clinical data and derive the features required to advance the state of the art (Chapter 2). In chapter 3 I present DataGauge, a model-driven framework for systematically assessing the quality of repurposed clinical data, which addresses current limitations in the state of the art. Chapter 4 describes the development of a guidance framework to ensure the systematicity of DQ assessment design. I then evaluate DataGauge’s ability to flag potential DQ issues in comparison to a systematic state of the art method. DataGauge was able to increase ten fold the number of potential DQ issues found over the systematic state of the art method. It identified more specific issues that were a direct threat to fitness for purpose, but also provided broader coverage of the clinical data types and knowledge domains involved in secondary analyses. DataGauge sets the groundwork for systematic and purpose-specific DQ assessments that fully integrate with secondary analysis workflows. It also promotes a team-based approach and the explicit definition of DQ requirements to support communication and transparent reporting of DQ results. Overall, this work provides tools that pave the way to a deeper understanding of repurposed clinical dataset limitations before analysis. It is also a first step towards the automation of purpose-specific DQ assessments for the secondary use of clinical data. Future work will consist of further development of these methods and validating them with research teams making secondary use of clinical data

    Using OCL to Model Constraints in Data Warehouses

    No full text
    Recent research works propose to use the Object-Oriented (OO) approach such as UML to model data warehouses. First, the present paper overviews these recent OO techniques. They aim to describe the facts and the different analysis dimensions of the data. Second, we will propose a tutorial of the Object Constraint Language (OCL) and we will show how this language can be used to specify constraints in OO-based models of data warehouse. Up to now, OCL has been only applied to describe constraints in software applications and transactional databases. So, we demonstrate in this paper how to use OCL to represent the different types of data warehouse constraints. Our paper is addressed to researchers working in the fields of business intelligence and decision support systems, who wish to learn about the major possibilities that OCL offer in the context of data warehouse, and to find citations that will allow them to learn about this formalism in greater detail. We also provide general information about the possible types of implementation of multi-dimensional models and their constraints
    corecore