983 research outputs found
Multi-Mention Learning for Reading Comprehension with Neural Cascades
Reading comprehension is a challenging task, especially when executed across
longer or across multiple evidence documents, where the answer is likely to
reoccur. Existing neural architectures typically do not scale to the entire
evidence, and hence, resort to selecting a single passage in the document
(either via truncation or other means), and carefully searching for the answer
within that passage. However, in some cases, this strategy can be suboptimal,
since by focusing on a specific passage, it becomes difficult to leverage
multiple mentions of the same answer throughout the document. In this work, we
take a different approach by constructing lightweight models that are combined
in a cascade to find the answer. Each submodel consists only of feed-forward
networks equipped with an attention mechanism, making it trivially
parallelizable. We show that our approach can scale to approximately an order
of magnitude larger evidence documents and can aggregate information at the
representation level from multiple mentions of each answer candidate across the
document. Empirically, our approach achieves state-of-the-art performance on
both the Wikipedia and web domains of the TriviaQA dataset, outperforming more
complex, recurrent architectures.Comment: Proceedings of ICLR 201
Multi-hop Reading Comprehension via Deep Reinforcement Learning based Document Traversal
Reading Comprehension has received significant attention in recent years as
high quality Question Answering (QA) datasets have become available. Despite
state-of-the-art methods achieving strong overall accuracy, Multi-Hop (MH)
reasoning remains particularly challenging. To address MH-QA specifically, we
propose a Deep Reinforcement Learning based method capable of learning
sequential reasoning across large collections of documents so as to pass a
query-aware, fixed-size context subset to existing models for answer
extraction. Our method is comprised of two stages: a linker, which decomposes
the provided support documents into a graph of sentences, and an extractor,
which learns where to look based on the current question and already-visited
sentences. The result of the linker is a novel graph structure at the sentence
level that preserves logical flow while still allowing rapid movement between
documents. Importantly, we demonstrate that the sparsity of the resultant graph
is invariant to context size. This translates to fewer decisions required from
the Deep-RL trained extractor, allowing the system to scale effectively to
large collections of documents.
The importance of sequential decision making in the document traversal step
is demonstrated by comparison to standard IE methods, and we additionally
introduce a BM25-based IR baseline that retrieves documents relevant to the
query only. We examine the integration of our method with existing models on
the recently proposed QAngaroo benchmark and achieve consistent increases in
accuracy across the board, as well as a 2-3x reduction in training time
Efficient and Robust Question Answering from Minimal Context over Documents
Neural models for question answering (QA) over documents have achieved
significant performance improvements. Although effective, these models do not
scale to large corpora due to their complex modeling of interactions between
the document and the question. Moreover, recent work has shown that such models
are sensitive to adversarial inputs. In this paper, we study the minimal
context required to answer the question, and find that most questions in
existing datasets can be answered with a small set of sentences. Inspired by
this observation, we propose a simple sentence selector to select the minimal
set of sentences to feed into the QA model. Our overall system achieves
significant reductions in training (up to 15 times) and inference times (up to
13 times), with accuracy comparable to or better than the state-of-the-art on
SQuAD, NewsQA, TriviaQA and SQuAD-Open. Furthermore, our experimental results
and analyses show that our approach is more robust to adversarial inputs.Comment: Published as a conference paper at ACL 2018 (long paper
Multi-step Retriever-Reader Interaction for Scalable Open-domain Question Answering
This paper introduces a new framework for open-domain question answering in
which the retriever and the reader iteratively interact with each other. The
framework is agnostic to the architecture of the machine reading model, only
requiring access to the token-level hidden representations of the reader. The
retriever uses fast nearest neighbor search to scale to corpora containing
millions of paragraphs. A gated recurrent unit updates the query at each step
conditioned on the state of the reader and the reformulated query is used to
re-rank the paragraphs by the retriever. We conduct analysis and show that
iterative interaction helps in retrieving informative paragraphs from the
corpus. Finally, we show that our multi-step-reasoning framework brings
consistent improvement when applied to two widely used reader architectures
DrQA and BiDAF on various large open-domain datasets --- TriviaQA-unfiltered,
QuasarT, SearchQA, and SQuAD-Open.Comment: Published at ICLR 201
Coarse-grain Fine-grain Coattention Network for Multi-evidence Question Answering
End-to-end neural models have made significant progress in question
answering, however recent studies show that these models implicitly assume that
the answer and evidence appear close together in a single document. In this
work, we propose the Coarse-grain Fine-grain Coattention Network (CFC), a new
question answering model that combines information from evidence across
multiple documents. The CFC consists of a coarse-grain module that interprets
documents with respect to the query then finds a relevant answer, and a
fine-grain module which scores each candidate answer by comparing its
occurrences across all of the documents with the query. We design these modules
using hierarchies of coattention and self-attention, which learn to emphasize
different parts of the input. On the Qangaroo WikiHop multi-evidence question
answering task, the CFC obtains a new state-of-the-art result of 70.6% on the
blind test set, outperforming the previous best by 3% accuracy despite not
using pretrained contextual encoders.Comment: ICLR 2019; 9 pages, 7 figure
Learning to Search in Long Documents Using Document Structure
Reading comprehension models are based on recurrent neural networks that
sequentially process the document tokens. As interest turns to answering more
complex questions over longer documents, sequential reading of large portions
of text becomes a substantial bottleneck. Inspired by how humans use document
structure, we propose a novel framework for reading comprehension. We represent
documents as trees, and model an agent that learns to interleave quick
navigation through the document tree with more expensive answer extraction. To
encourage exploration of the document tree, we propose a new algorithm, based
on Deep Q-Network (DQN), which strategically samples tree nodes at training
time. Empirically we find our algorithm improves question answering performance
compared to DQN and a strong information-retrieval (IR) baseline, and that
ensembling our model with the IR baseline results in further gains in
performance.Comment: COLING 2018 (camera ready version); v2: added acknowledgment
SRQA: Synthetic Reader for Factoid Question Answering
The question answering system can answer questions from various fields and
forms with deep neural networks, but it still lacks effective ways when facing
multiple evidences. We introduce a new model called SRQA, which means Synthetic
Reader for Factoid Question Answering. This model enhances the question
answering system in the multi-document scenario from three aspects: model
structure, optimization goal, and training method, corresponding to Multilayer
Attention (MA), Cross Evidence (CE), and Adversarial Training (AT)
respectively. First, we propose a multilayer attention network to obtain a
better representation of the evidences. The multilayer attention mechanism
conducts interaction between the question and the passage within each layer,
making the token representation of evidences in each layer takes the
requirement of the question into account. Second, we design a cross evidence
strategy to choose the answer span within more evidences. We improve the
optimization goal, considering all the answers' locations in multiple evidences
as training targets, which leads the model to reason among multiple evidences.
Third, adversarial training is employed to high-level variables besides the
word embedding in our model. A new normalization method is also proposed for
adversarial perturbations so that we can jointly add perturbations to several
target variables. As an effective regularization method, adversarial training
enhances the model's ability to process noisy data. Combining these three
strategies, we enhance the contextual representation and locating ability of
our model, which could synthetically extract the answer span from several
evidences. We perform SRQA on the WebQA dataset, and experiments show that our
model outperforms the state-of-the-art models (the best fuzzy score of our
model is up to 78.56%, with an improvement of about 2%).Comment: arXiv admin note: text overlap with arXiv:1809.0067
A Mutual Information Maximization Approach for the Spurious Solution Problem in Weakly Supervised Question Answering
Weakly supervised question answering usually has only the final answers as
supervision signals while the correct solutions to derive the answers are not
provided. This setting gives rise to the spurious solution problem: there may
exist many spurious solutions that coincidentally derive the correct answer,
but training on such solutions can hurt model performance (e.g., producing
wrong solutions or answers). For example, for discrete reasoning tasks as on
DROP, there may exist many equations to derive a numeric answer, and typically
only one of them is correct. Previous learning methods mostly filter out
spurious solutions with heuristics or using model confidence, but do not
explicitly exploit the semantic correlations between a question and its
solution. In this paper, to alleviate the spurious solution problem, we propose
to explicitly exploit such semantic correlations by maximizing the mutual
information between question-answer pairs and predicted solutions. Extensive
experiments on four question answering datasets show that our method
significantly outperforms previous learning methods in terms of task
performance and is more effective in training models to produce correct
solutions.Comment: ACL2021 main conferenc
Coreferential Reasoning Learning for Language Representation
Language representation models such as BERT could effectively capture
contextual semantic information from plain text, and have been proved to
achieve promising results in lots of downstream NLP tasks with appropriate
fine-tuning. However, most existing language representation models cannot
explicitly handle coreference, which is essential to the coherent understanding
of the whole discourse. To address this issue, we present CorefBERT, a novel
language representation model that can capture the coreferential relations in
context. The experimental results show that, compared with existing baseline
models, CorefBERT can achieve significant improvements consistently on various
downstream NLP tasks that require coreferential reasoning, while maintaining
comparable performance to previous models on other common NLP tasks. The source
code and experiment details of this paper can be obtained from
https://github.com/thunlp/CorefBERT.Comment: Accepted by EMNLP202
A Survey on Machine Reading Comprehension Systems
Machine reading comprehension is a challenging task and hot topic in natural
language processing. Its goal is to develop systems to answer the questions
regarding a given context. In this paper, we present a comprehensive survey on
different aspects of machine reading comprehension systems, including their
approaches, structures, input/outputs, and research novelties. We illustrate
the recent trends in this field based on 124 reviewed papers from 2016 to 2018.
Our investigations demonstrate that the focus of research has changed in recent
years from answer extraction to answer generation, from single to
multi-document reading comprehension, and from learning from scratch to using
pre-trained embeddings. We also discuss the popular datasets and the evaluation
metrics in this field. The paper ends with investigating the most cited papers
and their contributions
- …