17,160 research outputs found
ComQA: A Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters
To bridge the gap between the capabilities of the state-of-the-art in factoid
question answering (QA) and what users ask, we need large datasets of real user
questions that capture the various question phenomena users are interested in,
and the diverse ways in which these questions are formulated. We introduce
ComQA, a large dataset of real user questions that exhibit different
challenging aspects such as compositionality, temporal reasoning, and
comparisons. ComQA questions come from the WikiAnswers community QA platform,
which typically contains questions that are not satisfactorily answerable by
existing search engine technology. Through a large crowdsourcing effort, we
clean the question dataset, group questions into paraphrase clusters, and
annotate clusters with their answers. ComQA contains 11,214 questions grouped
into 4,834 paraphrase clusters. We detail the process of constructing ComQA,
including the measures taken to ensure its high quality while making effective
use of crowdsourcing. We also present an extensive analysis of the dataset and
the results achieved by state-of-the-art systems on ComQA, demonstrating that
our dataset can be a driver of future research on QA.Comment: 11 pages, NAACL 201
Survey on Evaluation Methods for Dialogue Systems
In this paper we survey the methods and concepts developed for the evaluation
of dialogue systems. Evaluation is a crucial part during the development
process. Often, dialogue systems are evaluated by means of human evaluations
and questionnaires. However, this tends to be very cost and time intensive.
Thus, much work has been put into finding methods, which allow to reduce the
involvement of human labour. In this survey, we present the main concepts and
methods. For this, we differentiate between the various classes of dialogue
systems (task-oriented dialogue systems, conversational dialogue systems, and
question-answering dialogue systems). We cover each class by introducing the
main technologies developed for the dialogue systems and then by presenting the
evaluation methods regarding this class
- …