2,010 research outputs found
Impact of Terminology Mapping on Population Health Cohorts IMPaCt
Background and Objectives: The population health care delivery model uses phenotype algorithms in the electronic health record (EHR) system to identify patient cohorts targeted for clinical interventions such as laboratory tests, and procedures. The standard terminology used to identify disease cohorts may contribute to significant variation in error rates for patient inclusion or exclusion. The United States requires EHR systems to support two diagnosis terminologies, the International Classification of Disease (ICD) and the Systematized Nomenclature of Medicine (SNOMED). Terminology mapping enables the retrieval of diagnosis data using either terminology. There are no standards of practice by which to evaluate and report the operational characteristics of ICD and SNOMED value sets used to select patient groups for population health interventions. Establishing a best practice for terminology selection is a step forward in ensuring that the right patients receive the right intervention at the right time. The research question is, “How does the diagnosis retrieval terminology (ICD vs SNOMED) and terminology map maintenance impact population health cohorts?” Aim 1 and 2 explore this question, and Aim 3 informs practice and policy for population health programs.
Methods
Aim 1: Quantify impact of terminology choice (ICD vs SNOMED)
ICD and SNOMED phenotype algorithms for diabetes, chronic kidney disease (CKD), and heart failure were developed using matched sets of codes from the Value Set Authority Center. The performance of the diagnosis-only phenotypes was compared to published reference standard that included diagnosis codes, laboratory results, procedures, and medications.
Aim 2: Measure terminology maintenance impact on SNOMED cohorts
For each disease state, the performance of a single SNOMED algorithm before and after terminology updates was evaluated in comparison to a reference standard to identify and quantify cohort changes introduced by terminology maintenance.
Aim 3: Recommend methods for improving population health interventions
The socio-technical model for studying health information technology was used to inform best practice for the use of population health interventions.
Results
Aim 1: ICD-10 value sets had better sensitivity than SNOMED for diabetes (.829, .662) and CKD (.242, .225) (N=201,713, p
Aim 2: Following terminology maintenance the SNOMED algorithm for diabetes increased in sensitivity from (.662 to .683 (p
Aim 3: Based on observed social and technical challenges to population health programs, including and in addition to the development and measurement of phenotypes, a practical method was proposed for population health intervention development and reporting
Optimized identification of advanced chronic kidney disease and absence of kidney disease by combining different electronic health data resources and by applying machine learning strategies
Automated identification of advanced chronic kidney disease (CKD ≥ III) and of no known kidney disease (NKD) can support both clinicians and researchers. We hypothesized that identification of CKD and NKD can be improved, by combining information from different electronic health record (EHR) resources, comprising laboratory values, discharge summaries and ICD-10 billing codes, compared to using each component alone. We included EHRs from 785 elderly multimorbid patients, hospitalized between 2010 and 2015, that were divided into a training and a test (n = 156) dataset. We used both the area under the receiver operating characteristic (AUROC) and under the precision-recall curve (AUCPR) with a 95% confidence interval for evaluation of different classification models. In the test dataset, the combination of EHR components as a simple classifier identified CKD ≥ III (AUROC 0.96[0.93–0.98]) and NKD (AUROC 0.94[0.91–0.97]) better than laboratory values (AUROC CKD 0.85[0.79–0.90], NKD 0.91[0.87–0.94]), discharge summaries (AUROC CKD 0.87[0.82–0.92], NKD 0.84[0.79–0.89]) or ICD-10 billing codes (AUROC CKD 0.85[0.80–0.91], NKD 0.77[0.72–0.83]) alone. Logistic regression and machine learning models improved recognition of CKD ≥ III compared to the simple classifier if only laboratory values were used (AUROC 0.96[0.92–0.99] vs. 0.86[0.81–0.91], p < 0.05) and improved recognition of NKD if information from previous hospital stays was used (AUROC 0.99[0.98–1.00] vs. 0.95[0.92–0.97]], p < 0.05). Depending on the availability of data, correct automated identification of CKD ≥ III and NKD from EHRs can be improved by generating classification models based on the combination of different EHR components
Dipole: Diagnosis Prediction in Healthcare via Attention-based Bidirectional Recurrent Neural Networks
Predicting the future health information of patients from the historical
Electronic Health Records (EHR) is a core research task in the development of
personalized healthcare. Patient EHR data consist of sequences of visits over
time, where each visit contains multiple medical codes, including diagnosis,
medication, and procedure codes. The most important challenges for this task
are to model the temporality and high dimensionality of sequential EHR data and
to interpret the prediction results. Existing work solves this problem by
employing recurrent neural networks (RNNs) to model EHR data and utilizing
simple attention mechanism to interpret the results. However, RNN-based
approaches suffer from the problem that the performance of RNNs drops when the
length of sequences is large, and the relationships between subsequent visits
are ignored by current RNN-based approaches. To address these issues, we
propose {\sf Dipole}, an end-to-end, simple and robust model for predicting
patients' future health information. Dipole employs bidirectional recurrent
neural networks to remember all the information of both the past visits and the
future visits, and it introduces three attention mechanisms to measure the
relationships of different visits for the prediction. With the attention
mechanisms, Dipole can interpret the prediction results effectively. Dipole
also allows us to interpret the learned medical code representations which are
confirmed positively by medical experts. Experimental results on two real world
EHR datasets show that the proposed Dipole can significantly improve the
prediction accuracy compared with the state-of-the-art diagnosis prediction
approaches and provide clinically meaningful interpretation
Identifying subtypes of chronic kidney disease with machine learning: development, internal validation and prognostic validation using linked electronic health records in 350,067 individuals
BACKGROUND: Although chronic kidney disease (CKD) is associated with high multimorbidity, polypharmacy, morbidity and mortality, existing classification systems (mild to severe, usually based on estimated glomerular filtration rate, proteinuria or urine albumin-creatinine ratio) and risk prediction models largely ignore the complexity of CKD, its risk factors and its outcomes. Improved subtype definition could improve prediction of outcomes and inform effective interventions. METHODS: We analysed individuals ≥18 years with incident and prevalent CKD (n = 350,067 and 195,422 respectively) from a population-based electronic health record resource (2006-2020; Clinical Practice Research Datalink, CPRD). We included factors (n = 264 with 2670 derived variables), e.g. demography, history, examination, blood laboratory values and medications. Using a published framework, we identified subtypes through seven unsupervised machine learning (ML) methods (K-means, Diana, HC, Fanny, PAM, Clara, Model-based) with 66 (of 2670) variables in each dataset. We evaluated subtypes for: (i) internal validity (within dataset, across methods); (ii) prognostic validity (predictive accuracy for 5-year all-cause mortality and admissions); and (iii) medications (new and existing by British National Formulary chapter). FINDINGS: After identifying five clusters across seven approaches, we labelled CKD subtypes: 1. Early-onset, 2. Late-onset, 3. Cancer, 4. Metabolic, and 5. Cardiometabolic. Internal validity: We trained a high performing model (using XGBoost) that could predict disease subtypes with 95% accuracy for incident and prevalent CKD (Sensitivity: 0.81-0.98, F1 score:0.84-0.97). Prognostic validity: 5-year all-cause mortality, hospital admissions, and incidence of new chronic diseases differed across CKD subtypes. The 5-year risk of mortality and admissions in the overall incident CKD population were highest in cardiometabolic subtype: 43.3% (42.3-42.8%) and 29.5% (29.1-30.0%), respectively, and lowest in the early-onset subtype: 5.7% (5.5-5.9%) and 18.7% (18.4-19.1%). MEDICATIONS: Across CKD subtypes, the distribution of prescription medication classes at baseline varied, with highest medication burden in cardiometabolic and metabolic subtypes, and higher burden in prevalent than incident CKD. INTERPRETATION: In the largest CKD study using ML, to-date, we identified five distinct subtypes in individuals with incident and prevalent CKD. These subtypes have relevance to study of aetiology, therapeutics and risk prediction. FUNDING: AstraZeneca UK Ltd, Health Data Research UK
Development of algorithms for determining heart failure with reduced and preserved ejection fraction using nationwide electronic healthcare records in the UK
Background: Determining heart failure (HF) phenotypes in routine electronic health records (EHR) is challenging. We aimed to develop and validate EHR algorithms for identification of specific HF phenotypes, using Read codes in combination with selected patient characteristics. Methods: We used The Healthcare Improvement Network (THIN). The study population included a random sample of individuals with HF diagnostic codes (HF with reduced ejection fraction (HFrEF), HF with preserved ejection fraction (HFpEF) and non-specific HF) selected from all participants registered in the THIN database between 1 January 2015 and 30 September 2017. Confirmed diagnoses were determined in a randomly selected subgroup of 500 patients via GP questionnaires including a review of all available cardiovascular investigations. Confirmed diagnoses of HFrEF and HFpEF were based on four criteria. Based on these data, we calculated a positive predictive value (PPV) of predefined algorithms which consisted of a combination of Read codes and additional information such as echocardiogram results and HF medication records. Results: The final cohort from which we drew the 500 patient random sample consisted of 10 275 patients. Response rate to the questionnaire was 77.2%. A small proportion (18%) of the overall HF patient population were coded with specific HF phenotype Read codes. For HFrEF, algorithms achieving over 80% PPV included definite, possible or non-specific HF HFrEF codes when combined with at least two of the drugs used to treat HFrEF. Only in non-specific HF coding did the use of three drugs (rather than two) contribute to an improvement of the PPV for HFrEF. HFpEF was only accurately defined with specific codes. In the absence of specific coding for HFpEF, the PPV was consistently below 50%. Conclusions: Prescription for HF medication can reliably be used to find HFrEF patients in the UK, even in the absence of a specific Read code for HFrEF. Algorithms using non-specific coding could not reliably find HFpEF patients
- …