5,154 research outputs found
A Survey of Paraphrasing and Textual Entailment Methods
Paraphrasing methods recognize, generate, or extract phrases, sentences, or
longer natural language expressions that convey almost the same information.
Textual entailment methods, on the other hand, recognize, generate, or extract
pairs of natural language expressions, such that a human who reads (and trusts)
the first element of a pair would most likely infer that the other element is
also true. Paraphrasing can be seen as bidirectional textual entailment and
methods from the two areas are often similar. Both kinds of methods are useful,
at least in principle, in a wide range of natural language processing
applications, including question answering, summarization, text generation, and
machine translation. We summarize key ideas from the two areas by considering
in turn recognition, generation, and extraction methods, also pointing to
prominent articles and resources.Comment: Technical Report, Natural Language Processing Group, Department of
Informatics, Athens University of Economics and Business, Greece, 201
ComQA: A Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters
To bridge the gap between the capabilities of the state-of-the-art in factoid
question answering (QA) and what users ask, we need large datasets of real user
questions that capture the various question phenomena users are interested in,
and the diverse ways in which these questions are formulated. We introduce
ComQA, a large dataset of real user questions that exhibit different
challenging aspects such as compositionality, temporal reasoning, and
comparisons. ComQA questions come from the WikiAnswers community QA platform,
which typically contains questions that are not satisfactorily answerable by
existing search engine technology. Through a large crowdsourcing effort, we
clean the question dataset, group questions into paraphrase clusters, and
annotate clusters with their answers. ComQA contains 11,214 questions grouped
into 4,834 paraphrase clusters. We detail the process of constructing ComQA,
including the measures taken to ensure its high quality while making effective
use of crowdsourcing. We also present an extensive analysis of the dataset and
the results achieved by state-of-the-art systems on ComQA, demonstrating that
our dataset can be a driver of future research on QA.Comment: 11 pages, NAACL 201
Compositional Semantic Parsing on Semi-Structured Tables
Two important aspects of semantic parsing for question answering are the
breadth of the knowledge source and the depth of logical compositionality.
While existing work trades off one aspect for another, this paper
simultaneously makes progress on both fronts through a new task: answering
complex questions on semi-structured tables using question-answer pairs as
supervision. The central challenge arises from two compounding factors: the
broader domain results in an open-ended set of relations, and the deeper
compositionality results in a combinatorial explosion in the space of logical
forms. We propose a logical-form driven parsing algorithm guided by strong
typing constraints and show that it obtains significant improvements over
natural baselines. For evaluation, we created a new dataset of 22,033 complex
questions on Wikipedia tables, which is made publicly available
- …