2,102 research outputs found
A Survey of Paraphrasing and Textual Entailment Methods
Paraphrasing methods recognize, generate, or extract phrases, sentences, or
longer natural language expressions that convey almost the same information.
Textual entailment methods, on the other hand, recognize, generate, or extract
pairs of natural language expressions, such that a human who reads (and trusts)
the first element of a pair would most likely infer that the other element is
also true. Paraphrasing can be seen as bidirectional textual entailment and
methods from the two areas are often similar. Both kinds of methods are useful,
at least in principle, in a wide range of natural language processing
applications, including question answering, summarization, text generation, and
machine translation. We summarize key ideas from the two areas by considering
in turn recognition, generation, and extraction methods, also pointing to
prominent articles and resources.Comment: Technical Report, Natural Language Processing Group, Department of
Informatics, Athens University of Economics and Business, Greece, 201
Automatic extraction of paraphrastic phrases from medium size corpora
This paper presents a versatile system intended to acquire paraphrastic
phrases from a representative corpus. In order to decrease the time spent on
the elaboration of resources for NLP system (for example Information
Extraction, IE hereafter), we suggest to use a machine learning system that
helps defining new templates and associated resources. This knowledge is
automatically derived from the text collection, in interaction with a large
semantic network
Bootstrapping Lexical Choice via Multiple-Sequence Alignment
An important component of any generation system is the mapping dictionary, a
lexicon of elementary semantic expressions and corresponding natural language
realizations. Typically, labor-intensive knowledge-based methods are used to
construct the dictionary. We instead propose to acquire it automatically via a
novel multiple-pass algorithm employing multiple-sequence alignment, a
technique commonly used in bioinformatics. Crucially, our method leverages
latent information contained in multi-parallel corpora -- datasets that supply
several verbalizations of the corresponding semantics rather than just one.
We used our techniques to generate natural language versions of
computer-generated mathematical proofs, with good results on both a
per-component and overall-output basis. For example, in evaluations involving a
dozen human judges, our system produced output whose readability and
faithfulness to the semantic input rivaled that of a traditional generation
system.Comment: 8 pages; to appear in the proceedings of EMNLP-200
Paraphrastic Reformulations in Spoken Corpora
International audienceOur work addresses the automatic detection of paraphrastic reformulation in French spoken corpora. The proposed approach is syn-tagmatic. It is based on specific markers and the specificities of the spoken language. Manual multi-dimensional annotation performed by two annotators provides fine-grained reference data. An automatic method is proposed in order to decide whether sentences contain or not paraphras-tic relations. The obtained results show up to 66.4% precision. Analysis of the manual annotations indicates that few paraphrastic segments show morphological modifications (inflection, derivation or compounding) and that the syntactic equivalence between the segments is seldom respected, as these usually belong to different syntactic categories
- …