5,180 research outputs found
Zero-Shot Relation Extraction via Reading Comprehension
We show that relation extraction can be reduced to answering simple reading
comprehension questions, by associating one or more natural-language questions
with each relation slot. This reduction has several advantages: we can (1)
learn relation-extraction models by extending recent neural
reading-comprehension techniques, (2) build very large training sets for those
models by combining relation-specific crowd-sourced questions with distant
supervision, and even (3) do zero-shot learning by extracting new relation
types that are only specified at test-time, for which we have no labeled
training examples. Experiments on a Wikipedia slot-filling task demonstrate
that the approach can generalize to new questions for known relation types with
high accuracy, and that zero-shot generalization to unseen relation types is
possible, at lower accuracy levels, setting the bar for future work on this
task.Comment: CoNLL 201
ComQA: A Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters
To bridge the gap between the capabilities of the state-of-the-art in factoid
question answering (QA) and what users ask, we need large datasets of real user
questions that capture the various question phenomena users are interested in,
and the diverse ways in which these questions are formulated. We introduce
ComQA, a large dataset of real user questions that exhibit different
challenging aspects such as compositionality, temporal reasoning, and
comparisons. ComQA questions come from the WikiAnswers community QA platform,
which typically contains questions that are not satisfactorily answerable by
existing search engine technology. Through a large crowdsourcing effort, we
clean the question dataset, group questions into paraphrase clusters, and
annotate clusters with their answers. ComQA contains 11,214 questions grouped
into 4,834 paraphrase clusters. We detail the process of constructing ComQA,
including the measures taken to ensure its high quality while making effective
use of crowdsourcing. We also present an extensive analysis of the dataset and
the results achieved by state-of-the-art systems on ComQA, demonstrating that
our dataset can be a driver of future research on QA.Comment: 11 pages, NAACL 201
- …