195 research outputs found
Automated Surveillance of Surgical Site Infections in a VA Hospital
Background: Surgical site infections (SSIs) account for approximately 17% of hospital-acquired infections. These infections result in an increase in emergency room visits, outpatient visits, radiology services, home health aide services, and readmissions adding an estimated $1 billion-10 billion in indirect and direct medical costs each year. The CDC and the Surgical Infection Society recommend routine surveillance as a method for decreasing the rates of these infections. By monitoring SSI rates, areas of improvement can be identified and interventions can be made to reduce the incidence of SSIs in the hospital. Reductions of up to 35% have been documented with the implementation of SSI surveillance programs. Current methods of surveillance in the VA are only partially automated and are labor intensive. Automated methods of surveillance using electronic medical records have been proposed to decrease the resources involved in SSI monitoring. The VA is well-suited for this with their extensive medical records database and relatively closed system of patients. Purpose: To construct an automated SSI surveillance system using electronic patient medical record data and validate this system by comparing its performance to the current surveillance method used at the Durham VA hospital. Methods: In this project, we modified the methods previously described by Richard Platt to create an automated SSI surveillance system at the VA hospital in Durham, North Carolina. We used ICD-9 codes, vital signs, microbiology data, consult orders, and pharmacy records sensitive and specific for SSIs to identify patients with potential infections. Logistic regression was used to create predictive models for SSIs of different severity. This system was validated by comparing its performance to that of the current manual record review performed by the infection control department in the hospital on patients who underwent surgery at the Durham VA hospital from May 1st, 2002 to April 30th, 2004. All surgical-site infections met the criteria set forth by the National Nosocomial Infections Surveillance (NNIS) report. The system was evaluated using the framework set forth by the CDC Working Group for public health surveillance systems Results: SSIs occurred in 195 of 7340 surgeries conducted in the study period (2.7% attack rate). Of these, 91 were superficial SSIs, 45 were deep SSIs, and 59 were organ/space SSIs. Logistic regression models using data found to be strongly correlated with SSI diagnoses had a sensitivity and specificity of 90.9% and 61.2% for all types of SSIs, 89.2% and 74.2% for severe SSIs (deep and organ/space) and 89.5% and 74.0% for organ/space SSIs, respectively. Conclusions: This study demonstrates that an automated SSI surveillance system with reasonable sensitivity and specificity can be created by using data from electronic medical records. Such a system can drastically reduce the amount of labor necessary for SSI monitoring and increase the speed these complications are detected. The information technology used at the Durham VA hospital is similar to that used in other VA hospitals, so this system can be exported to other hospitals throughout the country.Master of Public Healt
Overview of the Problem List Summarization (ProbSum) 2023 Shared Task on Summarizing Patients' Active Diagnoses and Problems from Electronic Health Record Progress Notes
The BioNLP Workshop 2023 initiated the launch of a shared task on Problem
List Summarization (ProbSum) in January 2023. The aim of this shared task is to
attract future research efforts in building NLP models for real-world
diagnostic decision support applications, where a system generating relevant
and accurate diagnoses will augment the healthcare providers decision-making
process and improve the quality of care for patients. The goal for participants
is to develop models that generated a list of diagnoses and problems using
input from the daily care notes collected from the hospitalization of
critically ill patients. Eight teams submitted their final systems to the
shared task leaderboard. In this paper, we describe the tasks, datasets,
evaluation metrics, and baseline systems. Additionally, the techniques and
results of the evaluation of the different approaches tried by the
participating teams are summarized.Comment: To appear in the Proceedings of the 5th BioNLP Workshop at AC
Multi-Task Training with In-Domain Language Models for Diagnostic Reasoning
Generative artificial intelligence (AI) is a promising direction for
augmenting clinical diagnostic decision support and reducing diagnostic errors,
a leading contributor to medical errors. To further the development of clinical
AI systems, the Diagnostic Reasoning Benchmark (DR.BENCH) was introduced as a
comprehensive generative AI framework, comprised of six tasks representing key
components in clinical reasoning. We present a comparative analysis of
in-domain versus out-of-domain language models as well as multi-task versus
single task training with a focus on the problem summarization task in DR.BENCH
(Gao et al., 2023). We demonstrate that a multi-task, clinically trained
language model outperforms its general domain counterpart by a large margin,
establishing a new state-of-the-art performance, with a ROUGE-L score of 28.55.
This research underscores the value of domain-specific training for optimizing
clinical diagnostic reasoning tasks.Comment: Accepted to the Proceedings of the 5th Clinical NLP Workshop at AC
Progress Note Understanding -- Assessment and Plan Reasoning: Overview of the 2022 N2C2 Track 3 Shared Task
Daily progress notes are common types in the electronic health record (EHR)
where healthcare providers document the patient's daily progress and treatment
plans. The EHR is designed to document all the care provided to patients, but
it also enables note bloat with extraneous information that distracts from the
diagnoses and treatment plans. Applications of natural language processing
(NLP) in the EHR is a growing field with the majority of methods in information
extraction. Few tasks use NLP methods for downstream diagnostic decision
support. We introduced the 2022 National NLP Clinical Challenge (N2C2) Track 3:
Progress Note Understanding - Assessment and Plan Reasoning as one step towards
a new suite of tasks. The Assessment and Plan Reasoning task focuses on the
most critical components of progress notes, Assessment and Plan subsections
where health problems and diagnoses are contained. The goal of the task was to
develop and evaluate NLP systems that automatically predict causal relations
between the overall status of the patient contained in the Assessment section
and its relation to each component of the Plan section which contains the
diagnoses and treatment plans. The goal of the task was to identify and
prioritize diagnoses as the first steps in diagnostic decision support to find
the most relevant information in long documents like daily progress notes. We
present the results of 2022 n2c2 Track 3 and provide a description of the data,
evaluation, participation and system performance.Comment: To appear in Journal of Biomedical Informatic
The Laboratory-Based Intermountain Validated Exacerbation (LIVE) Score Identifies Chronic Obstructive Pulmonary Disease Patients at High Mortality Risk.
Background: Identifying COPD patients at high risk for mortality or healthcare utilization remains a challenge. A robust system for identifying high-risk COPD patients using Electronic Health Record (EHR) data would empower targeting interventions aimed at ensuring guideline compliance and multimorbidity management. The purpose of this study was to empirically derive, validate, and characterize subgroups of COPD patients based on routinely collected clinical data widely available within the EHR. Methods: Cluster analysis was used in 5,006 patients with COPD at Intermountain to identify clusters based on a large collection of clinical variables. Recursive Partitioning (RP) was then used to determine a preferred tree that assigned patients to clusters based on a parsimonious variable subset. The mortality, COPD exacerbations, and comorbidity profile of the identified groups were examined. The findings were validated in an independent Intermountain cohort and in external cohorts from the United States Veterans Affairs (VA) and University of Chicago Medicine systems. Measurements and Main Results: The RP algorithm identified five LIVE Scores based on laboratory values: albumin, creatinine, chloride, potassium, and hemoglobin. The groups were characterized by increasing risk of mortality. The lowest risk, LIVE Score 5 had 8% 4-year mortality vs. 56% in the highest risk LIVE Score 1 (p < 0.001). These findings were validated in the VA cohort (n = 83,134), an expanded Intermountain cohort (n = 48,871) and in the University of Chicago system (n = 3,236). Higher mortality groups also had higher COPD exacerbation rates and comorbidity rates. Conclusions: In large clinical datasets across different organizations, the LIVE Score utilizes existing laboratory data for COPD patients, and may be used to stratify risk for mortality and COPD exacerbations
DR.BENCH: Diagnostic Reasoning Benchmark for Clinical Natural Language Processing
The meaningful use of electronic health records (EHR) continues to progress
in the digital era with clinical decision support systems augmented by
artificial intelligence. A priority in improving provider experience is to
overcome information overload and reduce the cognitive burden so fewer medical
errors and cognitive biases are introduced during patient care. One major type
of medical error is diagnostic error due to systematic or predictable errors in
judgment that rely on heuristics. The potential for clinical natural language
processing (cNLP) to model diagnostic reasoning in humans with forward
reasoning from data to diagnosis and potentially reduce the cognitive burden
and medical error has not been investigated. Existing tasks to advance the
science in cNLP have largely focused on information extraction and named entity
recognition through classification tasks. We introduce a novel suite of tasks
coined as Diagnostic Reasoning Benchmarks, DR.BENCH, as a new benchmark for
developing and evaluating cNLP models with clinical diagnostic reasoning
ability. The suite includes six tasks from ten publicly available datasets
addressing clinical text understanding, medical knowledge reasoning, and
diagnosis generation. DR.BENCH is the first clinical suite of tasks designed to
be a natural language generation framework to evaluate pre-trained language
models. Experiments with state-of-the-art pre-trained generative language
models using large general domain models and models that were continually
trained on a medical corpus demonstrate opportunities for improvement when
evaluated in DR. BENCH. We share DR. BENCH as a publicly available GitLab
repository with a systematic approach to load and evaluate models for the cNLP
community.Comment: Under revie
Comparison of machine learning clustering algorithms for detecting heterogeneity of treatment effect in acute respiratory distress syndrome: A secondary analysis of three randomised controlled trials
BACKGROUND: Heterogeneity in Acute Respiratory Distress Syndrome (ARDS), as a consequence of its non-specific definition, has led to a multitude of negative randomised controlled trials (RCTs). Investigators have sought to identify heterogeneity of treatment effect (HTE) in RCTs using clustering algorithms. We evaluated the proficiency of several commonly-used machine-learning algorithms to identify clusters where HTE may be detected.
METHODS: Five unsupervised: Latent class analysis (LCA), K-means, partition around medoids, hierarchical, and spectral clustering; and four supervised algorithms: model-based recursive partitioning, Causal Forest (CF), and X-learner with Random Forest (XL-RF) and Bayesian Additive Regression Trees were individually applied to three prior ARDS RCTs. Clinical data and research protein biomarkers were used as partitioning variables, with the latter excluded for secondary analyses. For a clustering schema, HTE was evaluated based on the interaction term of treatment group and cluster with day-90 mortality as the dependent variable.
FINDINGS: No single algorithm identified clusters with significant HTE in all three trials. LCA, XL-RF, and CF identified HTE most frequently (2/3 RCTs). Important partitioning variables in the unsupervised approaches were consistent across algorithms and RCTs. In supervised models, important partitioning variables varied between algorithms and across RCTs. In algorithms where clusters demonstrated HTE in the same trial, patients frequently interchanged clusters from treatment-benefit to treatment-harm clusters across algorithms. LCA aside, results from all other algorithms were subject to significant alteration in cluster composition and HTE with random seed change. Removing research biomarkers as partitioning variables greatly reduced the chances of detecting HTE across all algorithms.
INTERPRETATION: Machine-learning algorithms were inconsistent in their abilities to identify clusters with significant HTE. Protein biomarkers were essential in identifying clusters with HTE. Investigations using machine-learning approaches to identify clusters to seek HTE require cautious interpretation.
FUNDING: NIGMS R35 GM142992 (PS), NHLBI R35 HL140026 (CSC); NIGMS R01 GM123193, Department of Defense W81XWH-21-1-0009, NIA R21 AG068720, NIDA R01 DA051464 (MMC)
Recommended from our members
Early Warning Scores With and Without Artificial Intelligence
Importance: Early warning decision support tools to identify clinical deterioration in the hospital are widely used, but there is little information on their comparative performance. Objective: To compare 3 proprietary artificial intelligence (AI) early warning scores and 3 publicly available simple aggregated weighted scores. Design, Setting, and Participants: This retrospective cohort study was performed at 7 hospitals in the Yale New Haven Health System. All consecutive adult medical-surgical ward hospital encounters between March 9, 2019, and November 9, 2023, were included. Exposures: Simultaneous Epic Deterioration Index (EDI), Rothman Index (RI), eCARTv5 (eCART), Modified Early Warning Score (MEWS), National Early Warning Score (NEWS), and NEWS2 scores. Main Outcomes and Measures: Clinical deterioration, defined as a transfer from ward to intensive care unit or death within 24 hours of an observation. Results: Of the 362 926 patient encounters (median patient age, 64 [IQR, 47-77] years; 200 642 [55.3%] female), 16 693 (4.6%) experienced a clinical deterioration event. eCART had the highest area under the receiver operating characteristic curve at 0.895 (95% CI, 0.891-0.900), followed by NEWS2 at 0.831 (95% CI, 0.826-0.836), NEWS at 0.829 (95% CI, 0.824-0.835), RI at 0.828 (95% CI, 0.823-0.834), EDI at 0.808 (95% CI, 0.802-0.812), and MEWS at 0.757 (95% CI, 0.750-0.764). After matching scores at the moderate-risk sensitivity level for a NEWS score of 5, overall positive predictive values (PPVs) ranged from a low of 6.3% (95% CI, 6.1%-6.4%) for an EDI score of 41 to a high of 17.3% (95% CI, 16.9%-17.8%) for an eCART score of 94. Matching scores at the high-risk specificity of a NEWS score of 7 yielded overall PPVs ranging from a low of 14.5% (95% CI, 14.0%-15.2%) for an EDI score of 54 to a high of 23.3% (95% CI, 22.7%-24.2%) for an eCART score of 97. The moderate-risk thresholds provided a median of at least 20 hours of lead time for all the scores. Median lead time at the high-risk threshold was 11 (IQR, 0-69) hours for eCART, 8 (IQR, 0-63) hours for NEWS, 6 (IQR, 0-62) hours for NEWS2, 5 (IQR, 0-56) hours for MEWS, 1 (IQR, 0-39) hour for EDI, and 0 (IQR, 0-42) hours for RI. Conclusions and Relevance: In this cohort study of inpatient encounters, eCART outperformed the other AI and non-AI scores, identifying more deteriorating patients with fewer false alarms and sufficient time to intervene. NEWS, a non-AI, publicly available early warning score, significantly outperformed EDI. Given the wide variation in accuracy, additional transparency and oversight of early warning tools may be warranted.</p
Hospital trajectories and early predictors of clinical outcomes differ between SARS-CoV-2 and influenza pneumonia
BACKGROUND: A comparison of pneumonias due to SARS-CoV-2 and influenza, in terms of clinical course and predictors of outcomes, might inform prognosis and resource management. We aimed to compare clinical course and outcome predictors in SARS-CoV-2 and influenza pneumonia using multi-state modelling and supervised machine learning on clinical data among hospitalised patients.
METHODS: This multicenter retrospective cohort study of patients hospitalised with SARS-CoV-2 (March-December 2020) or influenza (Jan 2015-March 2020) pneumonia had the composite of hospital mortality and hospice discharge as the primary outcome. Multi-state models compared differences in oxygenation/ventilatory utilisation between pneumonias longitudinally throughout hospitalisation. Differences in predictors of outcome were modelled using supervised machine learning classifiers.
FINDINGS: Among 2,529 hospitalisations with SARS-CoV-2 and 2,256 with influenza pneumonia, the primary outcome occurred in 21% and 9%, respectively. Multi-state models differentiated oxygen requirement progression between viruses, with SARS-CoV-2 manifesting rapidly-escalating early hypoxemia. Highly contributory classifier variables for the primary outcome differed substantially between viruses.
INTERPRETATION: SARS-CoV-2 and influenza pneumonia differ in presentation, hospital course, and outcome predictors. These pathogen-specific differential responses in viral pneumonias suggest distinct management approaches should be investigated.
FUNDING: This project was supported by NIH/NCATS UL1 TR002345, NIH/NCATS KL2 TR002346 (PGL), the Doris Duke Charitable Foundation grant 2015215 (PGL), NIH/NHLBI R35 HL140026 (CSC), and a Big Ideas Award from the BJC HealthCare and Washington University School of Medicine Healthcare Innovation Lab and NIH/NIGMS R35 GM142992 (PS)
- …