114 research outputs found

    Leveraging text data for causal inference using electronic health records

    Full text link
    Text is a ubiquitous component of medical data, containing valuable information about patient characteristics and care that are often missing from structured chart data. Despite this richness, it is rarely used in clinical research, owing partly to its complexity. Using a large database of patient records and treatment histories accompanied by extensive notes by attendant physicians and nurses, we show how text data can be used to support causal inference with electronic health data in all stages, from conception and design to analysis and interpretation, with minimal additional effort. We focus on studies using matching for causal inference. We augment a classic matching analysis by incorporating text in three ways: by using text to supplement a multiple imputation procedure, we improve the fidelity of imputed values to handle missing data; by incorporating text in the matching stage, we strengthen the plausibility of the matching procedure; and by conditioning on text, we can estimate easily interpretable text-based heterogeneous treatment effects that may be stronger than those found across categories of structured covariates. Using these techniques, we hope to expand the scope of secondary analysis of clinical data to domains where quantitative data is of poor quality or nonexistent, but where text is available, such as in developing countries

    The PLOS ONE collection on machine learning in health and biomedicine: Towards open code and open data

    Get PDF
    Recent years have seen a surge of studies in machine learning in health and biomedicine, driven by digitalization of healthcare environments and increasingly accessible computer systems for conducting analyses. Many of us believe that these developments will lead to significant improvements in patient care. Like many academic disciplines, however, progress is hampered by lack of code and data sharing. In bringing together this PLOS ONE collection on machine learning in health and biomedicine, we sought to focus on the importance of reproducibility, making it a requirement, as far as possible, for authors to share data and code alongside their papers

    Datathons and Software to Promote Reproducible Research

    Get PDF
    Background: Datathons facilitate collaboration between clinicians, statisticians, and data scientists in order to answer important clinical questions. Previous datathons have resulted in numerous publications of interest to the critical care community and serve as a viable model for interdisciplinary collaboration. Objective: We report on an open-source software called Chatto that was created by members of our group, in the context of the second international Critical Care Datathon, held in September 2015. Methods: Datathon participants formed teams to discuss potential research questions and the methods required to address them. They were provided with the Chatto suite of tools to facilitate their teamwork. Each multidisciplinary team spent the next 2 days with clinicians working alongside data scientists to write code, extract and analyze data, and reformulate their queries in real time as needed. All projects were then presented on the last day of the datathon to a panel of judges that consisted of clinicians and scientists. Results: Use of Chatto was particularly effective in the datathon setting, enabling teams to reduce the time spent configuring their research environments to just a few minutesā€”a process that would normally take hours to days. Chatto continued to serve as a useful research tool after the conclusion of the datathon. Conclusions: This suite of tools fulfills two purposes: (1) facilitation of interdisciplinary teamwork through archiving and version control of datasets, analytical code, and team discussions, and (2) advancement of research reproducibility by functioning postpublication as an online environment in which independent investigators can rerun or modify analyses with relative ease. With the introduction of Chatto, we hope to solve a variety of challenges presented by collaborative data mining projects while improving research reproducibility

    The association between the neutrophil-to-lymphocyte ratio and mortality in critical illness: an observational cohort study

    Get PDF
    Introduction The neutrophil-to-lymphocyte ratio (NLR) is a biological marker that has been shown to be associated with outcomes in patients with a number of different malignancies. The objective of this study was to assess the relationship between NLR and mortality in a population of adult critically ill patients. Methods We performed an observational cohort study of unselected intensive care unit (ICU) patients based on records in a large clinical database. We computed individual patient NLR and categorized patients by quartile of this ratio. The association of NLR quartiles and 28-day mortality was assessed using multivariable logistic regression. Secondary outcomes included mortality in the ICU, in-hospital mortality and 1-year mortality. An a priori subgroup analysis of patients with versus without sepsis was performed to assess any differences in the relationship between the NLR and outcomes in these cohorts. Results A total of 5,056 patients were included. Their 28-day mortality rate was 19%. The median age of the cohort was 65 years, and 47% were female. The median NLR for the entire cohort was 8.9 (interquartile range, 4.99 to 16.21). Following multivariable adjustments, there was a stepwise increase in mortality with increasing quartiles of NLR (first quartile: reference category; second quartile odds ratio (OR) = 1.32; 95% confidence interval (CI), 1.03 to 1.71; third quartile OR = 1.43; 95% CI, 1.12 to 1.83; 4th quartile OR = 1.71; 95% CI, 1.35 to 2.16). A similar stepwise relationship was identified in the subgroup of patients who presented without sepsis. The NLR was not associated with 28-day mortality in patients with sepsis. Increasing quartile of NLR was statistically significantly associated with secondary outcome. Conclusion The NLR is associated with outcomes in unselected critically ill patients. In patients with sepsis, there was no statistically significant relationship between NLR and mortality. Further investigation is required to increase understanding of the pathophysiology of this relationship and to validate these findings with data collected prospectively.National Institutes of Health (U.S.) (Grant R01 EB017205-01A1
    • ā€¦
    corecore