6 research outputs found

    Supplementary data for a study on a data driven learning approach for the assessment of data quality

    No full text
    This data is 100% artificial (no real patients involved). This dataset was generated to explore the application of simple machine learning to learn knowledge about how simple statistical measures about a dataset (e.g. mean value for variable, value counts etc.) can indicate data quality issues. The generated dummy data, outcome data and MM-results are available in the folder "dummy data, outcome data, MM-results". The numbers of triggered DQ-issue rules are available in folder "triggered issue rules". The MM-results used for machine learning can be found in file "MM-results_export_for_machine_learningresult_exports.csv"

    A data driven learning approach for the assessment of data quality

    No full text
    Background!#!Data quality assessment is important but complex and task dependent. Identifying suitable measurement methods and reference ranges for assessing their results is challenging. Manually inspecting the measurement results and current data driven approaches for learning which results indicate data quality issues have considerable limitations, e.g. to identify task dependent thresholds for measurement results that indicate data quality issues.!##!Objectives!#!To explore the applicability and potential benefits of a data driven approach to learn task dependent knowledge about suitable measurement methods and assessment of their results. Such knowledge could be useful for others to determine whether a local data stock is suitable for a given task.!##!Methods!#!We started by creating artificial data with previously defined data quality issues and applied a set of generic measurement methods on this data (e.g. a method to count the number of values in a certain variable or the mean value of the values). We trained decision trees on exported measurement methods' results and corresponding outcome data (data that indicated the data's suitability for a use case). For evaluation, we derived rules for potential measurement methods and reference values from the decision trees and compared these regarding their coverage of the true data quality issues artificially created in the dataset. Three researchers independently derived these rules. One with knowledge about present data quality issues and two without.!##!Results!#!Our self-trained decision trees were able to indicate rules for 12 of 19 previously defined data quality issues. Learned knowledge about measurement methods and their assessment was complementary to manual interpretation of measurement methods' results.!##!Conclusions!#!Our data driven approach derives sensible knowledge for task dependent data quality assessment and complements other current approaches. Based on labeled measurement methods' results as training data, our approach successfully suggested applicable rules for checking data quality characteristics that determine whether a dataset is suitable for a given task

    A method for interoperable knowledge-based data quality assessment

    No full text
    Background!#!Assessing the quality of healthcare data is a complex task including the selection of suitable measurement methods (MM) and adequately assessing their results.!##!Objectives!#!To present an interoperable data quality (DQ) assessment method that formalizes MMs based on standardized data definitions and intends to support collaborative governance of DQ-assessment knowledge, e.g. which MMs to apply and how to assess their results in different situations.!##!Methods!#!We describe and explain central concepts of our method using the example of its first real world application in a study on predictive biomarkers for rejection and other injuries of kidney transplants. We applied our open source tool-openCQA-that implements our method utilizing the openEHR specifications. Means to support collaborative governance of DQ-assessment knowledge are the version-control system git and openEHR clinical information models.!##!Results!#!Applying the method on the study's dataset showed satisfactory practicability of the described concepts and produced useful results for DQ-assessment.!##!Conclusions!#!The main contribution of our work is to provide applicable concepts and a tested exemplary open source implementation for interoperable and knowledge-based DQ-assessment in healthcare that considers the need for flexible task and domain specific requirements
    corecore