2,010 research outputs found

    Impact of Terminology Mapping on Population Health Cohorts IMPaCt

    Get PDF
    Background and Objectives: The population health care delivery model uses phenotype algorithms in the electronic health record (EHR) system to identify patient cohorts targeted for clinical interventions such as laboratory tests, and procedures. The standard terminology used to identify disease cohorts may contribute to significant variation in error rates for patient inclusion or exclusion. The United States requires EHR systems to support two diagnosis terminologies, the International Classification of Disease (ICD) and the Systematized Nomenclature of Medicine (SNOMED). Terminology mapping enables the retrieval of diagnosis data using either terminology. There are no standards of practice by which to evaluate and report the operational characteristics of ICD and SNOMED value sets used to select patient groups for population health interventions. Establishing a best practice for terminology selection is a step forward in ensuring that the right patients receive the right intervention at the right time. The research question is, “How does the diagnosis retrieval terminology (ICD vs SNOMED) and terminology map maintenance impact population health cohorts?” Aim 1 and 2 explore this question, and Aim 3 informs practice and policy for population health programs. Methods Aim 1: Quantify impact of terminology choice (ICD vs SNOMED) ICD and SNOMED phenotype algorithms for diabetes, chronic kidney disease (CKD), and heart failure were developed using matched sets of codes from the Value Set Authority Center. The performance of the diagnosis-only phenotypes was compared to published reference standard that included diagnosis codes, laboratory results, procedures, and medications. Aim 2: Measure terminology maintenance impact on SNOMED cohorts For each disease state, the performance of a single SNOMED algorithm before and after terminology updates was evaluated in comparison to a reference standard to identify and quantify cohort changes introduced by terminology maintenance. Aim 3: Recommend methods for improving population health interventions The socio-technical model for studying health information technology was used to inform best practice for the use of population health interventions. Results Aim 1: ICD-10 value sets had better sensitivity than SNOMED for diabetes (.829, .662) and CKD (.242, .225) (N=201,713, p Aim 2: Following terminology maintenance the SNOMED algorithm for diabetes increased in sensitivity from (.662 to .683 (p Aim 3: Based on observed social and technical challenges to population health programs, including and in addition to the development and measurement of phenotypes, a practical method was proposed for population health intervention development and reporting

    Optimized identification of advanced chronic kidney disease and absence of kidney disease by combining different electronic health data resources and by applying machine learning strategies

    Get PDF
    Automated identification of advanced chronic kidney disease (CKD ≥ III) and of no known kidney disease (NKD) can support both clinicians and researchers. We hypothesized that identification of CKD and NKD can be improved, by combining information from different electronic health record (EHR) resources, comprising laboratory values, discharge summaries and ICD-10 billing codes, compared to using each component alone. We included EHRs from 785 elderly multimorbid patients, hospitalized between 2010 and 2015, that were divided into a training and a test (n = 156) dataset. We used both the area under the receiver operating characteristic (AUROC) and under the precision-recall curve (AUCPR) with a 95% confidence interval for evaluation of different classification models. In the test dataset, the combination of EHR components as a simple classifier identified CKD ≥ III (AUROC 0.96[0.93–0.98]) and NKD (AUROC 0.94[0.91–0.97]) better than laboratory values (AUROC CKD 0.85[0.79–0.90], NKD 0.91[0.87–0.94]), discharge summaries (AUROC CKD 0.87[0.82–0.92], NKD 0.84[0.79–0.89]) or ICD-10 billing codes (AUROC CKD 0.85[0.80–0.91], NKD 0.77[0.72–0.83]) alone. Logistic regression and machine learning models improved recognition of CKD ≥ III compared to the simple classifier if only laboratory values were used (AUROC 0.96[0.92–0.99] vs. 0.86[0.81–0.91], p < 0.05) and improved recognition of NKD if information from previous hospital stays was used (AUROC 0.99[0.98–1.00] vs. 0.95[0.92–0.97]], p < 0.05). Depending on the availability of data, correct automated identification of CKD ≥ III and NKD from EHRs can be improved by generating classification models based on the combination of different EHR components

    Dipole: Diagnosis Prediction in Healthcare via Attention-based Bidirectional Recurrent Neural Networks

    Full text link
    Predicting the future health information of patients from the historical Electronic Health Records (EHR) is a core research task in the development of personalized healthcare. Patient EHR data consist of sequences of visits over time, where each visit contains multiple medical codes, including diagnosis, medication, and procedure codes. The most important challenges for this task are to model the temporality and high dimensionality of sequential EHR data and to interpret the prediction results. Existing work solves this problem by employing recurrent neural networks (RNNs) to model EHR data and utilizing simple attention mechanism to interpret the results. However, RNN-based approaches suffer from the problem that the performance of RNNs drops when the length of sequences is large, and the relationships between subsequent visits are ignored by current RNN-based approaches. To address these issues, we propose {\sf Dipole}, an end-to-end, simple and robust model for predicting patients' future health information. Dipole employs bidirectional recurrent neural networks to remember all the information of both the past visits and the future visits, and it introduces three attention mechanisms to measure the relationships of different visits for the prediction. With the attention mechanisms, Dipole can interpret the prediction results effectively. Dipole also allows us to interpret the learned medical code representations which are confirmed positively by medical experts. Experimental results on two real world EHR datasets show that the proposed Dipole can significantly improve the prediction accuracy compared with the state-of-the-art diagnosis prediction approaches and provide clinically meaningful interpretation

    Identifying subtypes of chronic kidney disease with machine learning: development, internal validation and prognostic validation using linked electronic health records in 350,067 individuals

    Get PDF
    BACKGROUND: Although chronic kidney disease (CKD) is associated with high multimorbidity, polypharmacy, morbidity and mortality, existing classification systems (mild to severe, usually based on estimated glomerular filtration rate, proteinuria or urine albumin-creatinine ratio) and risk prediction models largely ignore the complexity of CKD, its risk factors and its outcomes. Improved subtype definition could improve prediction of outcomes and inform effective interventions. METHODS: We analysed individuals ≥18 years with incident and prevalent CKD (n = 350,067 and 195,422 respectively) from a population-based electronic health record resource (2006-2020; Clinical Practice Research Datalink, CPRD). We included factors (n = 264 with 2670 derived variables), e.g. demography, history, examination, blood laboratory values and medications. Using a published framework, we identified subtypes through seven unsupervised machine learning (ML) methods (K-means, Diana, HC, Fanny, PAM, Clara, Model-based) with 66 (of 2670) variables in each dataset. We evaluated subtypes for: (i) internal validity (within dataset, across methods); (ii) prognostic validity (predictive accuracy for 5-year all-cause mortality and admissions); and (iii) medications (new and existing by British National Formulary chapter). FINDINGS: After identifying five clusters across seven approaches, we labelled CKD subtypes: 1. Early-onset, 2. Late-onset, 3. Cancer, 4. Metabolic, and 5. Cardiometabolic. Internal validity: We trained a high performing model (using XGBoost) that could predict disease subtypes with 95% accuracy for incident and prevalent CKD (Sensitivity: 0.81-0.98, F1 score:0.84-0.97). Prognostic validity: 5-year all-cause mortality, hospital admissions, and incidence of new chronic diseases differed across CKD subtypes. The 5-year risk of mortality and admissions in the overall incident CKD population were highest in cardiometabolic subtype: 43.3% (42.3-42.8%) and 29.5% (29.1-30.0%), respectively, and lowest in the early-onset subtype: 5.7% (5.5-5.9%) and 18.7% (18.4-19.1%). MEDICATIONS: Across CKD subtypes, the distribution of prescription medication classes at baseline varied, with highest medication burden in cardiometabolic and metabolic subtypes, and higher burden in prevalent than incident CKD. INTERPRETATION: In the largest CKD study using ML, to-date, we identified five distinct subtypes in individuals with incident and prevalent CKD. These subtypes have relevance to study of aetiology, therapeutics and risk prediction. FUNDING: AstraZeneca UK Ltd, Health Data Research UK

    Development of algorithms for determining heart failure with reduced and preserved ejection fraction using nationwide electronic healthcare records in the UK

    Get PDF
    Background: Determining heart failure (HF) phenotypes in routine electronic health records (EHR) is challenging. We aimed to develop and validate EHR algorithms for identification of specific HF phenotypes, using Read codes in combination with selected patient characteristics. Methods: We used The Healthcare Improvement Network (THIN). The study population included a random sample of individuals with HF diagnostic codes (HF with reduced ejection fraction (HFrEF), HF with preserved ejection fraction (HFpEF) and non-specific HF) selected from all participants registered in the THIN database between 1 January 2015 and 30 September 2017. Confirmed diagnoses were determined in a randomly selected subgroup of 500 patients via GP questionnaires including a review of all available cardiovascular investigations. Confirmed diagnoses of HFrEF and HFpEF were based on four criteria. Based on these data, we calculated a positive predictive value (PPV) of predefined algorithms which consisted of a combination of Read codes and additional information such as echocardiogram results and HF medication records. Results: The final cohort from which we drew the 500 patient random sample consisted of 10 275 patients. Response rate to the questionnaire was 77.2%. A small proportion (18%) of the overall HF patient population were coded with specific HF phenotype Read codes. For HFrEF, algorithms achieving over 80% PPV included definite, possible or non-specific HF HFrEF codes when combined with at least two of the drugs used to treat HFrEF. Only in non-specific HF coding did the use of three drugs (rather than two) contribute to an improvement of the PPV for HFrEF. HFpEF was only accurately defined with specific codes. In the absence of specific coding for HFpEF, the PPV was consistently below 50%. Conclusions: Prescription for HF medication can reliably be used to find HFrEF patients in the UK, even in the absence of a specific Read code for HFrEF. Algorithms using non-specific coding could not reliably find HFpEF patients
    • …
    corecore