1,139 research outputs found
Paraphrase Generation with Deep Reinforcement Learning
Automatic generation of paraphrases from a given sentence is an important yet
challenging task in natural language processing (NLP), and plays a key role in
a number of applications such as question answering, search, and dialogue. In
this paper, we present a deep reinforcement learning approach to paraphrase
generation. Specifically, we propose a new framework for the task, which
consists of a \textit{generator} and an \textit{evaluator}, both of which are
learned from data. The generator, built as a sequence-to-sequence learning
model, can produce paraphrases given a sentence. The evaluator, constructed as
a deep matching model, can judge whether two sentences are paraphrases of each
other. The generator is first trained by deep learning and then further
fine-tuned by reinforcement learning in which the reward is given by the
evaluator. For the learning of the evaluator, we propose two methods based on
supervised learning and inverse reinforcement learning respectively, depending
on the type of available training data. Empirical study shows that the learned
evaluator can guide the generator to produce more accurate paraphrases.
Experimental results demonstrate the proposed models (the generators)
outperform the state-of-the-art methods in paraphrase generation in both
automatic evaluation and human evaluation.Comment: EMNLP 201
From Paraphrase Database to Compositional Paraphrase Model and Back
The Paraphrase Database (PPDB; Ganitkevitch et al., 2013) is an extensive
semantic resource, consisting of a list of phrase pairs with (heuristic)
confidence estimates. However, it is still unclear how it can best be used, due
to the heuristic nature of the confidences and its necessarily incomplete
coverage. We propose models to leverage the phrase pairs from the PPDB to build
parametric paraphrase models that score paraphrase pairs more accurately than
the PPDB's internal scores while simultaneously improving its coverage. They
allow for learning phrase embeddings as well as improved word embeddings.
Moreover, we introduce two new, manually annotated datasets to evaluate
short-phrase paraphrasing models. Using our paraphrase model trained using
PPDB, we achieve state-of-the-art results on standard word and bigram
similarity tasks and beat strong baselines on our new short phrase paraphrase
tasks.Comment: 2015 TACL paper updated with an appendix describing new 300
dimensional embeddings. Submitted 1/2015. Accepted 2/2015. Published 6/201
- …