114 research outputs found
Leveraging text data for causal inference using electronic health records
Text is a ubiquitous component of medical data, containing valuable
information about patient characteristics and care that are often missing from
structured chart data. Despite this richness, it is rarely used in clinical
research, owing partly to its complexity. Using a large database of patient
records and treatment histories accompanied by extensive notes by attendant
physicians and nurses, we show how text data can be used to support causal
inference with electronic health data in all stages, from conception and design
to analysis and interpretation, with minimal additional effort. We focus on
studies using matching for causal inference. We augment a classic matching
analysis by incorporating text in three ways: by using text to supplement a
multiple imputation procedure, we improve the fidelity of imputed values to
handle missing data; by incorporating text in the matching stage, we strengthen
the plausibility of the matching procedure; and by conditioning on text, we can
estimate easily interpretable text-based heterogeneous treatment effects that
may be stronger than those found across categories of structured covariates.
Using these techniques, we hope to expand the scope of secondary analysis of
clinical data to domains where quantitative data is of poor quality or
nonexistent, but where text is available, such as in developing countries
Recommended from our members
Customized Prediction of Short Length of Stay Following Elective Cardiac Surgery in Elderly Patients Using a Genetic Algorithm
Objective: To develop a customized short LOS (<6 days) prediction model for geriatric patients receiving cardiac surgery, using local data and a computational feature selection algorithm. Design: Utilization of a machine learning algorithm in a prospectively collected STS database consisting of patients who received cardiac surgery between January 2002 and June 2011. Setting: Urban tertiary-care center. Participants: Geriatric patients aged 70 years or older at the time of cardiac surgery. Interventions None. Measurements and Main Results Predefined morbidity and mortality events were collected from the STS database. 23 clinically relevant predictors were investigated for short LOS prediction with a genetic algorithm (GenAlg) in 1426 patients. Due to the absence of an STS model for their particular surgery type, STS risk scores were unavailable for 771 patients. STS prediction achieved an AUC of 0.629 while the GenAlg achieved AUCs of 0.573 (in those with STS scores) and 0.691 (in those without STS scores). Among the patients with STS scores, the GenAlg features significantly associated with shorter LOS were absence of congestive heart failure (CHF) (OR = 0.59, p = 0.04), aortic valve procedure (OR = 1.54, p = 0.04), and shorter cross clamp time (OR = 0.99, p = 0.004). In those without STS prediction, short LOS was significantly correlated with younger age (OR = 0.93, p < 0.001), absence of CHF (OR = 0.53, p = 0.007), no preoperative use of beta blockers (OR = 0.66, p = 0.03), and shorter cross clamp time (OR = 0.99, p < 0.001). Conclusion: While the GenAlg-based models did not outperform STS prediction for patients with STS risk scores, our local-data-driven approach reliably predicted short LOS for cardiac surgery types that do not allow STS risk calculation. We advocate that each institution with sufficient observational data should build their own cardiac surgery risk models
The PLOS ONE collection on machine learning in health and biomedicine: Towards open code and open data
Recent years have seen a surge of studies in machine learning in health and biomedicine, driven by digitalization of healthcare environments and increasingly accessible computer systems for conducting analyses. Many of us believe that these developments will lead to significant improvements in patient care. Like many academic disciplines, however, progress is hampered by lack of code and data sharing. In bringing together this PLOS ONE collection on machine learning in health and biomedicine, we sought to focus on the importance of reproducibility, making it a requirement, as far as possible, for authors to share data and code alongside their papers
Datathons and Software to Promote Reproducible Research
Background: Datathons facilitate collaboration between clinicians, statisticians, and data scientists in order to answer important clinical questions. Previous datathons have resulted in numerous publications of interest to the critical care community and serve as a viable model for interdisciplinary collaboration.
Objective: We report on an open-source software called Chatto that was created by members of our group, in the context of the second international Critical Care Datathon, held in September 2015.
Methods: Datathon participants formed teams to discuss potential research questions and the methods required to address them. They were provided with the Chatto suite of tools to facilitate their teamwork. Each multidisciplinary team spent the next 2 days with clinicians working alongside data scientists to write code, extract and analyze data, and reformulate their queries in real time as needed. All projects were then presented on the last day of the datathon to a panel of judges that consisted of clinicians and scientists.
Results: Use of Chatto was particularly effective in the datathon setting, enabling teams to reduce the time spent configuring their research environments to just a few minutesāa process that would normally take hours to days. Chatto continued to serve as a useful research tool after the conclusion of the datathon.
Conclusions: This suite of tools fulfills two purposes: (1) facilitation of interdisciplinary teamwork through archiving and version control of datasets, analytical code, and team discussions, and (2) advancement of research reproducibility by functioning postpublication as an online environment in which independent investigators can rerun or modify analyses with relative ease. With the introduction of Chatto, we hope to solve a variety of challenges presented by collaborative data mining projects while improving research reproducibility
The association between the neutrophil-to-lymphocyte ratio and mortality in critical illness: an observational cohort study
Introduction
The neutrophil-to-lymphocyte ratio (NLR) is a biological marker that has been shown to be associated with outcomes in patients with a number of different malignancies. The objective of this study was to assess the relationship between NLR and mortality in a population of adult critically ill patients.
Methods
We performed an observational cohort study of unselected intensive care unit (ICU) patients based on records in a large clinical database. We computed individual patient NLR and categorized patients by quartile of this ratio. The association of NLR quartiles and 28-day mortality was assessed using multivariable logistic regression. Secondary outcomes included mortality in the ICU, in-hospital mortality and 1-year mortality. An a priori subgroup analysis of patients with versus without sepsis was performed to assess any differences in the relationship between the NLR and outcomes in these cohorts.
Results
A total of 5,056 patients were included. Their 28-day mortality rate was 19%. The median age of the cohort was 65 years, and 47% were female. The median NLR for the entire cohort was 8.9 (interquartile range, 4.99 to 16.21). Following multivariable adjustments, there was a stepwise increase in mortality with increasing quartiles of NLR (first quartile: reference category; second quartile odds ratio (OR) = 1.32; 95% confidence interval (CI), 1.03 to 1.71; third quartile OR = 1.43; 95% CI, 1.12 to 1.83; 4th quartile OR = 1.71; 95% CI, 1.35 to 2.16). A similar stepwise relationship was identified in the subgroup of patients who presented without sepsis. The NLR was not associated with 28-day mortality in patients with sepsis. Increasing quartile of NLR was statistically significantly associated with secondary outcome.
Conclusion
The NLR is associated with outcomes in unselected critically ill patients. In patients with sepsis, there was no statistically significant relationship between NLR and mortality. Further investigation is required to increase understanding of the pathophysiology of this relationship and to validate these findings with data collected prospectively.National Institutes of Health (U.S.) (Grant R01 EB017205-01A1
- ā¦