821 research outputs found
Do Multi-hop Readers Dream of Reasoning Chains?
General Question Answering (QA) systems over texts require the multi-hop
reasoning capability, i.e. the ability to reason with information collected
from multiple passages to derive the answer. In this paper we conduct a
systematic analysis to assess such an ability of various existing models
proposed for multi-hop QA tasks. Specifically, our analysis investigates that
whether providing the full reasoning chain of multiple passages, instead of
just one final passage where the answer appears, could improve the performance
of the existing QA models. Surprisingly, when using the additional evidence
passages, the improvements of all the existing multi-hop reading approaches are
rather limited, with the highest error reduction of 5.8% on F1 (corresponding
to 1.3% absolute improvement) from the BERT model.
To better understand whether the reasoning chains could indeed help find
correct answers, we further develop a co-matching-based method that leads to
13.1% error reduction with passage chains when applied to two of our base
readers (including BERT). Our results demonstrate the existence of the
potential improvement using explicit multi-hop reasoning and the necessity to
develop models with better reasoning abilities.Comment: Accepted by MRQA Workshop 201
Look at the First Sentence: Position Bias in Question Answering
Many extractive question answering models are trained to predict start and
end positions of answers. The choice of predicting answers as positions is
mainly due to its simplicity and effectiveness. In this study, we hypothesize
that when the distribution of the answer positions is highly skewed in the
training set (e.g., answers lie only in the k-th sentence of each passage), QA
models predicting answers as positions can learn spurious positional cues and
fail to give answers in different positions. We first illustrate this position
bias in popular extractive QA models such as BiDAF and BERT and thoroughly
examine how position bias propagates through each layer of BERT. To safely
deliver position information without position bias, we train models with
various de-biasing methods including entropy regularization and bias
ensembling. Among them, we found that using the prior distribution of answer
positions as a bias model is very effective at reducing position bias,
recovering the performance of BERT from 37.48% to 81.64% when trained on a
biased SQuAD dataset.Comment: 13 pages, EMNLP 202
- …