2 research outputs found
Detrimental Contexts in Open-Domain Question Answering
For knowledge intensive NLP tasks, it has been widely accepted that accessing
more information is a contributing factor to improvements in the model's
end-to-end performance. However, counter-intuitively, too much context can have
a negative impact on the model when evaluated on common question answering (QA)
datasets. In this paper, we analyze how passages can have a detrimental effect
on retrieve-then-read architectures used in question answering. Our empirical
evidence indicates that the current read architecture does not fully leverage
the retrieved passages and significantly degrades its performance when using
the whole passages compared to utilizing subsets of them. Our findings
demonstrate that model accuracy can be improved by 10% on two popular QA
datasets by filtering out detrimental passages. Additionally, these outcomes
are attained by utilizing existing retrieval methods without further training
or data. We further highlight the challenges associated with identifying the
detrimental passages. First, even with the correct context, the model can make
an incorrect prediction, posing a challenge in determining which passages are
most influential. Second, evaluation typically considers lexical matching,
which is not robust to variations of correct answers. Despite these
limitations, our experimental results underscore the pivotal role of
identifying and removing these detrimental passages for the context-efficient
retrieve-then-read pipeline. Code and data are available at
https://github.com/xfactlab/emnlp2023-damaging-retrievalComment: Findings of EMNLP 202
Knowledge Corpus Error in Question Answering
Recent works in open-domain question answering (QA) have explored generating
context passages from large language models (LLMs), replacing the traditional
retrieval step in the QA pipeline. However, it is not well understood why
generated passages can be more effective than retrieved ones. This study
revisits the conventional formulation of QA and introduces the concept of
knowledge corpus error. This error arises when the knowledge corpus used for
retrieval is only a subset of the entire string space, potentially excluding
more helpful passages that exist outside the corpus. LLMs may mitigate this
shortcoming by generating passages in a larger space. We come up with an
experiment of paraphrasing human-annotated gold context using LLMs to observe
knowledge corpus error empirically. Our results across three QA benchmarks
reveal an increased performance (10% - 13%) when using paraphrased passage,
indicating a signal for the existence of knowledge corpus error. Our code is
available at https://github.com/xfactlab/emnlp2023-knowledge-corpus-errorComment: Findings of EMNLP 202