2 research outputs found
Towards Automatic Generation of Questions from Long Answers
Automatic question generation (AQG) has broad applicability in domains such
as tutoring systems, conversational agents, healthcare literacy, and
information retrieval. Existing efforts at AQG have been limited to short
answer lengths of up to two or three sentences. However, several real-world
applications require question generation from answers that span several
sentences. Therefore, we propose a novel evaluation benchmark to assess the
performance of existing AQG systems for long-text answers. We leverage the
large-scale open-source Google Natural Questions dataset to create the
aforementioned long-answer AQG benchmark. We empirically demonstrate that the
performance of existing AQG methods significantly degrades as the length of the
answer increases. Transformer-based methods outperform other existing AQG
methods on long answers in terms of automatic as well as human evaluation.
However, we still observe degradation in the performance of our best performing
models with increasing sentence length, suggesting that long answer QA is a
challenging benchmark task for future research
Summary-Oriented Question Generation for Informational Queries
Users frequently ask simple factoid questions for question answering (QA)
systems, attenuating the impact of myriad recent works that support more
complex questions. Prompting users with automatically generated suggested
questions (SQs) can improve user understanding of QA system capabilities and
thus facilitate more effective use. We aim to produce self-explanatory
questions that focus on main document topics and are answerable with variable
length passages as appropriate. We satisfy these requirements by using a
BERT-based Pointer-Generator Network trained on the Natural Questions (NQ)
dataset. Our model shows SOTA performance of SQ generation on the NQ dataset
(20.1 BLEU-4). We further apply our model on out-of-domain news articles,
evaluating with a QA system due to the lack of gold questions and demonstrate
that our model produces better SQs for news articles -- with further
confirmation via a human evaluation.Comment: 17 page