58 research outputs found
QuAC : Question Answering in Context
We present QuAC, a dataset for Question Answering in Context that contains
14K information-seeking QA dialogs (100K questions in total). The dialogs
involve two crowd workers: (1) a student who poses a sequence of freeform
questions to learn as much as possible about a hidden Wikipedia text, and (2) a
teacher who answers the questions by providing short excerpts from the text.
QuAC introduces challenges not found in existing machine comprehension
datasets: its questions are often more open-ended, unanswerable, or only
meaningful within the dialog context, as we show in a detailed qualitative
evaluation. We also report results for a number of reference models, including
a recently state-of-the-art reading comprehension architecture extended to
model dialog context. Our best model underperforms humans by 20 F1, suggesting
that there is significant room for future work on this data. Dataset, baseline,
and leaderboard available at http://quac.ai.Comment: EMNLP Camera Read
Generating Question-Answer Hierarchies
The process of knowledge acquisition can be viewed as a question-answer game
between a student and a teacher in which the student typically starts by asking
broad, open-ended questions before drilling down into specifics (Hintikka,
1981; Hakkarainen and Sintonen, 2002). This pedagogical perspective motivates a
new way of representing documents. In this paper, we present SQUASH
(Specificity-controlled Question-Answer Hierarchies), a novel and challenging
text generation task that converts an input document into a hierarchy of
question-answer pairs. Users can click on high-level questions (e.g., "Why did
Frodo leave the Fellowship?") to reveal related but more specific questions
(e.g., "Who did Frodo leave with?"). Using a question taxonomy loosely based on
Lehnert (1978), we classify questions in existing reading comprehension
datasets as either "general" or "specific". We then use these labels as input
to a pipelined system centered around a conditional neural language model. We
extensively evaluate the quality of the generated QA hierarchies through
crowdsourced experiments and report strong empirical results.Comment: ACL camera ready + technical note on pipeline modifications for demo
(15 pages
An Empirical Study of Content Understanding in Conversational Question Answering
With a lot of work about context-free question answering systems, there is an
emerging trend of conversational question answering models in the natural
language processing field. Thanks to the recently collected datasets, including
QuAC and CoQA, there has been more work on conversational question answering,
and recent work has achieved competitive performance on both datasets. However,
to best of our knowledge, two important questions for conversational
comprehension research have not been well studied: 1) How well can the
benchmark dataset reflect models' content understanding? 2) Do the models well
utilize the conversation content when answering questions? To investigate these
questions, we design different training settings, testing settings, as well as
an attack to verify the models' capability of content understanding on QuAC and
CoQA. The experimental results indicate some potential hazards in the benchmark
datasets, QuAC and CoQA, for conversational comprehension research. Our
analysis also sheds light on both what models may learn and how datasets may
bias the models. With deep investigation of the task, it is believed that this
work can benefit the future progress of conversation comprehension. The source
code is available at https://github.com/MiuLab/CQA-Study.Comment: Published at AAAI 202
Recurrent Chunking Mechanisms for Long-Text Machine Reading Comprehension
In this paper, we study machine reading comprehension (MRC) on long texts,
where a model takes as inputs a lengthy document and a question and then
extracts a text span from the document as an answer. State-of-the-art models
tend to use a pretrained transformer model (e.g., BERT) to encode the joint
contextual information of document and question. However, these
transformer-based models can only take a fixed-length (e.g., 512) text as its
input. To deal with even longer text inputs, previous approaches usually chunk
them into equally-spaced segments and predict answers based on each segment
independently without considering the information from other segments. As a
result, they may form segments that fail to cover the correct answer span or
retain insufficient contexts around it, which significantly degrades the
performance. Moreover, they are less capable of answering questions that need
cross-segment information.
We propose to let a model learn to chunk in a more flexible way via
reinforcement learning: a model can decide the next segment that it wants to
process in either direction. We also employ recurrent mechanisms to enable
information to flow across segments. Experiments on three MRC datasets -- CoQA,
QuAC, and TriviaQA -- demonstrate the effectiveness of our proposed recurrent
chunking mechanisms: we can obtain segments that are more likely to contain
complete answers and at the same time provide sufficient contexts around the
ground truth answers for better predictions
Open-Retrieval Conversational Question Answering
Conversational search is one of the ultimate goals of information retrieval.
Recent research approaches conversational search by simplified settings of
response ranking and conversational question answering, where an answer is
either selected from a given candidate set or extracted from a given passage.
These simplifications neglect the fundamental role of retrieval in
conversational search. To address this limitation, we introduce an
open-retrieval conversational question answering (ORConvQA) setting, where we
learn to retrieve evidence from a large collection before extracting answers,
as a further step towards building functional conversational search systems. We
create a dataset, OR-QuAC, to facilitate research on ORConvQA. We build an
end-to-end system for ORConvQA, featuring a retriever, a reranker, and a reader
that are all based on Transformers. Our extensive experiments on OR-QuAC
demonstrate that a learnable retriever is crucial for ORConvQA. We further show
that our system can make a substantial improvement when we enable history
modeling in all system components. Moreover, we show that the reranker
component contributes to the model performance by providing a regularization
effect. Finally, further in-depth analyses are performed to provide new
insights into ORConvQA.Comment: Accepted to SIGIR'2
- …