1 research outputs found

    Data quality in health research: the development of methods to improve the assessment of temporal data quality in electronic health records

    Get PDF
    Background: Electronic health records (EHR) are increasingly used in medical research, but the prevalence of temporal artefacts that may bias study findings is not widely understood or reported. Furthermore, methods aimed at efficient and transparent assessment of temporal data quality in EHR datasets are unfortunately lacking. Methods: 7959 time series representing different measures of data quality were generated from eight different EHR data extracts covering activity between 1986-2019 at a large UK hospital group. These time series were visually inspected and annotated via a citizen-science crowd-sourcing platform, and consensus labels for the locations of all change points (i.e. places where the distribution of data values changed suddenly and unpredictably) were constructed using density-based clustering with noise. The crowd-sourced consensus labels were validated against labels produced by an experienced data scientist, and a diverse range of automated change point detection methods were assessed for accuracy against these consensus labels using a novel approximation to a binary classifier. Lastly, an R package was developed to facilitate assessment of temporal data quality in EHR datasets. Results: Over 2000 volunteers participated in the citizen-science project, performing 341,800 visual inspections of the time series. A total of 4477 distinct change points were identified across the eight data extracts, covering almost every year of data and virtually all data fields. Compared to expert labels, accuracy of crowd-sourced consensus labels identifying the locations of individual change points had high sensitivity 80.4% (95% CI 77.1, 83.3), specificity 99.8% (99.7, 99.8), positive predictive value (PPV) 84.5% (81.4, 87.2) and negative predictive value (NPV) 99.7% (99.6, 99.7). Automated change point detection methods failed to detect the crowd-sourced change points accurately, with maximum sensitivity 36.9% (35.2, 38.8), specificity 100% (100, 100), PPV 51.6% (49.4, 53.8), and NPV 99.9% (99.9, 99.9). Conclusions: This large study of real-world EHR found temporal artefacts occurred with very high frequency, which could impact findings from analyses using these data. Crowd-sourced labels of change points compared favourably to expert labels, but currently-available automated methods performed poorly at identifying such artefacts when compared to human visual inspection. To improve reproducibility and transparency of studies using EHRs, thorough visual assessment of temporal data quality should be conducted and reported, which can be assisted by tools such as the new daiquiri R package developed as part of this thesis
    corecore