In this paper, we introduce EHR-SeqSQL, a novel sequential text-to-SQL
dataset for Electronic Health Record (EHR) databases. EHR-SeqSQL is designed to
address critical yet underexplored aspects in text-to-SQL parsing:
interactivity, compositionality, and efficiency. To the best of our knowledge,
EHR-SeqSQL is not only the largest but also the first medical text-to-SQL
dataset benchmark to include sequential and contextual questions. We provide a
data split and the new test set designed to assess compositional generalization
ability. Our experiments demonstrate the superiority of a multi-turn approach
over a single-turn approach in learning compositionality. Additionally, our
dataset integrates specially crafted tokens into SQL queries to improve
execution efficiency. With EHR-SeqSQL, we aim to bridge the gap between
practical needs and academic research in the text-to-SQL domain. EHR-SeqSQL is
available at https://github.com/seonhee99/EHR-SeqSQL.Comment: ACL 2024 (Findings