Releasing the CRaQAn (Coreference Resolution in Question-Answering): An
open-source dataset and dataset creation methodology using
instruction-following models
Instruction-following language models demand robust methodologies for
information retrieval to augment instructions for question-answering
applications. A primary challenge is the resolution of coreferences in the
context of chunking strategies for long documents. The critical barrier to
experimentation of handling coreferences is a lack of open source datasets,
specifically in question-answering tasks that require coreference resolution.
In this work we present our Coreference Resolution in Question-Answering
(CRaQAn) dataset, an open-source dataset that caters to the nuanced information
retrieval requirements of coreference resolution in question-answering tasks by
providing over 250 question-answer pairs containing coreferences. To develop
this dataset, we developed a novel approach for creating high-quality datasets
using an instruction-following model (GPT-4) and a Recursive Criticism and
Improvement Loop.Comment: NeurIPS 2023 Workshop on Instruction Tuning and Instruction Followin