Supplementary data for a study on a data driven learning approach for the assessment of data quality

Abstract

This data is 100% artificial (no real patients involved). This dataset was generated to explore the application of simple machine learning to learn knowledge about how simple statistical measures about a dataset (e.g. mean value for variable, value counts etc.) can indicate data quality issues. The generated dummy data, outcome data and MM-results are available in the folder "dummy data, outcome data, MM-results". The numbers of triggered DQ-issue rules are available in folder "triggered issue rules". The MM-results used for machine learning can be found in file "MM-results_export_for_machine_learningresult_exports.csv"

    Similar works

    Full text

    thumbnail-image