Search CORE

7 research outputs found

SDOH-NLI: a Dataset for Inferring Social Determinants of Health from Clinical Notes

Author: Chen Ming-Jun
Lelkes Adam D.
Loreaux Eric
Rajkomar Alvin
Schuster Tal
Publication venue
Publication date: 27/10/2023
Field of study

Social and behavioral determinants of health (SDOH) play a significant role in shaping health outcomes, and extracting these determinants from clinical notes is a first step to help healthcare providers systematically identify opportunities to provide appropriate care and address disparities. Progress on using NLP methods for this task has been hindered by the lack of high-quality publicly available labeled data, largely due to the privacy and regulatory constraints on the use of real patients' information. This paper introduces a new dataset, SDOH-NLI, that is based on publicly available notes and which we release publicly. We formulate SDOH extraction as a natural language inference (NLI) task, and provide binary textual entailment labels obtained from human raters for a cross product of a set of social history snippets as premises and SDOH factors as hypotheses. Our dataset differs from standard NLI benchmarks in that our premises and hypotheses are obtained independently. We evaluate both "off-the-shelf" entailment models as well as models fine-tuned on our data, and highlight the ways in which our dataset appears more challenging than commonly used NLI datasets.Comment: Findings of EMNLP 202

arXiv.org e-Print Archive

Instability in clinical risk stratification models using deep learning

Author: Chen Ming-Jun
Downing N. Lance
Kemp Jonas
Lelkes Adam D.
Li Ron C.
Lopez-Martinez Daniel
Morse Keith E.
Seneviratne Martin
Shah Nigam H.
Steinberg Ethan
Tyagi Akshit
Yakubovich Alex
Publication venue
Publication date: 19/11/2022
Field of study

While it has been well known in the ML community that deep learning models suffer from instability, the consequences for healthcare deployments are under characterised. We study the stability of different model architectures trained on electronic health records, using a set of outpatient prediction tasks as a case study. We show that repeated training runs of the same deep learning model on the same training data can result in significantly different outcomes at a patient level even though global performance metrics remain stable. We propose two stability metrics for measuring the effect of randomness of model training, as well as mitigation strategies for improving model stability.Comment: Accepted for publication in Machine Learning for Health (ML4H) 202

arXiv.org e-Print Archive

All Birds with One Stone: Multi-task Text Classification for Efficient Inference with One Forward Pass

Author: Han Jiawei
Huang Jiaxin
Lelkes Adam D.
Liu Jialu
Liu Tianqi
Yu Cong
Publication venue
Publication date: 22/05/2022
Field of study

Multi-Task Learning (MTL) models have shown their robustness, effectiveness, and efficiency for transferring learned knowledge across tasks. In real industrial applications such as web content classification, multiple classification tasks are predicted from the same input text such as a web article. However, at the serving time, the existing multitask transformer models such as prompt or adaptor based approaches need to conduct N forward passes for N tasks with O(N) computation cost. To tackle this problem, we propose a scalable method that can achieve stronger performance with close to O(1) computation cost via only one forward pass. To illustrate real application usage, we release a multitask dataset on news topic and style classification. Our experiments show that our proposed method outperforms strong baselines on both the GLUE benchmark and our news dataset. Our code and dataset are publicly available at https://bit.ly/mtop-code

arXiv.org e-Print Archive