1,767 research outputs found
STARC: Structured Annotations for Reading Comprehension
We present STARC (Structured Annotations for Reading Comprehension), a new
annotation framework for assessing reading comprehension with multiple choice
questions. Our framework introduces a principled structure for the answer
choices and ties them to textual span annotations. The framework is implemented
in OneStopQA, a new high-quality dataset for evaluation and analysis of reading
comprehension in English. We use this dataset to demonstrate that STARC can be
leveraged for a key new application for the development of SAT-like reading
comprehension materials: automatic annotation quality probing via span ablation
experiments. We further show that it enables in-depth analyses and comparisons
between machine and human reading comprehension behavior, including error
distributions and guessing ability. Our experiments also reveal that the
standard multiple choice dataset in NLP, RACE, is limited in its ability to
measure reading comprehension. 47% of its questions can be guessed by machines
without accessing the passage, and 18% are unanimously judged by humans as not
having a unique correct answer. OneStopQA provides an alternative test set for
reading comprehension which alleviates these shortcomings and has a
substantially higher human ceiling performance.Comment: ACL 2020. OneStopQA dataset, STARC guidelines and human experiments
data are available at https://github.com/berzak/onestop-q
AGent: A Novel Pipeline for Automatically Creating Unanswerable Questions
The development of large high-quality datasets and high-performing models
have led to significant advancements in the domain of Extractive Question
Answering (EQA). This progress has sparked considerable interest in exploring
unanswerable questions within the EQA domain. Training EQA models with
unanswerable questions helps them avoid extracting misleading or incorrect
answers for queries that lack valid responses. However, manually annotating
unanswerable questions is labor-intensive. To address this, we propose AGent, a
novel pipeline that automatically creates new unanswerable questions by
re-matching a question with a context that lacks the necessary information for
a correct answer. In this paper, we demonstrate the usefulness of this AGent
pipeline by creating two sets of unanswerable questions from answerable
questions in SQuAD and HotpotQA. These created question sets exhibit low error
rates. Additionally, models fine-tuned on these questions show comparable
performance with those fine-tuned on the SQuAD 2.0 dataset on multiple EQA
benchmarks.Comment: 16 pages, 10 tables, 3 figure
- …