991 research outputs found
Controlling Risk of Web Question Answering
Web question answering (QA) has become an indispensable component in modern
search systems, which can significantly improve users' search experience by
providing a direct answer to users' information need. This could be achieved by
applying machine reading comprehension (MRC) models over the retrieved passages
to extract answers with respect to the search query. With the development of
deep learning techniques, state-of-the-art MRC performances have been achieved
by recent deep methods. However, existing studies on MRC seldom address the
predictive uncertainty issue, i.e., how likely the prediction of an MRC model
is wrong, leading to uncontrollable risks in real-world Web QA applications. In
this work, we first conduct an in-depth investigation over the risk of Web QA.
We then introduce a novel risk control framework, which consists of a qualify
model for uncertainty estimation using the probe idea, and a decision model for
selectively output. For evaluation, we introduce risk-related metrics, rather
than the traditional EM and F1 in MRC, for the evaluation of risk-aware Web QA.
The empirical results over both the real-world Web QA dataset and the academic
MRC benchmark collection demonstrate the effectiveness of our approach.Comment: 42nd International ACM SIGIR Conference on Research and Development
in Information Retrieva
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering
We present a new kind of question answering dataset, OpenBookQA, modeled
after open book exams for assessing human understanding of a subject. The open
book that comes with our questions is a set of 1329 elementary level science
facts. Roughly 6000 questions probe an understanding of these facts and their
application to novel situations. This requires combining an open book fact
(e.g., metals conduct electricity) with broad common knowledge (e.g., a suit of
armor is made of metal) obtained from other sources. While existing QA datasets
over documents or knowledge bases, being generally self-contained, focus on
linguistic understanding, OpenBookQA probes a deeper understanding of both the
topic---in the context of common knowledge---and the language it is expressed
in. Human performance on OpenBookQA is close to 92%, but many state-of-the-art
pre-trained QA methods perform surprisingly poorly, worse than several simple
neural baselines we develop. Our oracle experiments designed to circumvent the
knowledge retrieval bottleneck demonstrate the value of both the open book and
additional facts. We leave it as a challenge to solve the retrieval problem in
this multi-hop setting and to close the large gap to human performance.Comment: Published as conference long paper at EMNLP 201
- …