5 research outputs found
ECG-QA: A Comprehensive Question Answering Dataset Combined With Electrocardiogram
Question answering (QA) in the field of healthcare has received much
attention due to significant advancements in natural language processing.
However, existing healthcare QA datasets primarily focus on medical images,
clinical notes, or structured electronic health record tables. This leaves the
vast potential of combining electrocardiogram (ECG) data with these systems
largely untapped. To address this gap, we present ECG-QA, the first QA dataset
specifically designed for ECG analysis. The dataset comprises a total of 70
question templates that cover a wide range of clinically relevant ECG topics,
each validated by an ECG expert to ensure their clinical utility. As a result,
our dataset includes diverse ECG interpretation questions, including those that
require a comparative analysis of two different ECGs. In addition, we have
conducted numerous experiments to provide valuable insights for future research
directions. We believe that ECG-QA will serve as a valuable resource for the
development of intelligent QA systems capable of assisting clinicians in ECG
interpretations.Comment: 39 pages (9 pages for main text, 2 pages for references, 28 pages for
supplementary materials
EHRSQL: A Practical Text-to-SQL Benchmark for Electronic Health Records
We present a new text-to-SQL dataset for electronic health records (EHRs).
The utterances were collected from 222 hospital staff, including physicians,
nurses, insurance review and health records teams, and more. To construct the
QA dataset on structured EHR data, we conducted a poll at a university hospital
and templatized the responses to create seed questions. Then, we manually
linked them to two open-source EHR databases, MIMIC-III and eICU, and included
them with various time expressions and held-out unanswerable questions in the
dataset, which were all collected from the poll. Our dataset poses a unique set
of challenges: the model needs to 1) generate SQL queries that reflect a wide
range of needs in the hospital, including simple retrieval and complex
operations such as calculating survival rate, 2) understand various time
expressions to answer time-sensitive questions in healthcare, and 3)
distinguish whether a given question is answerable or unanswerable based on the
prediction confidence. We believe our dataset, EHRSQL, could serve as a
practical benchmark to develop and assess QA models on structured EHR data and
take one step further towards bridging the gap between text-to-SQL research and
its real-life deployment in healthcare. EHRSQL is available at
https://github.com/glee4810/EHRSQL.Comment: Published as a conference paper at NeurIPS 2022 (Track on Datasets
and Benchmarks)
KU-DMIS-MSRA at RadSum23: Pre-trained Vision-Language Model for Radiology Report Summarization
In this paper, we introduce CheXOFA, a new pre-trained vision-language model
(VLM) for the chest X-ray domain. Our model is initially pre-trained on various
multimodal datasets within the general domain before being transferred to the
chest X-ray domain. Following a prominent VLM, we unify various domain-specific
tasks into a simple sequence-to-sequence schema. It enables the model to
effectively learn the required knowledge and skills from limited resources in
the domain. Demonstrating superior performance on the benchmark datasets
provided by the BioNLP shared task, our model benefits from its training across
multiple tasks and domains. With subtle techniques including ensemble and
factual calibration, our system achieves first place on the RadSum23
leaderboard for the hidden test set.Comment: Published at BioNLP workshop @ ACL 202