5,928 research outputs found
Using Semantic Similarity as Reward for Reinforcement Learning in Sentence Generation
学位の種別: 修士University of Tokyo(東京大学
Deep Reinforcement Learning-based Image Captioning with Embedding Reward
Image captioning is a challenging problem owing to the complexity in
understanding the image content and diverse ways of describing it in natural
language. Recent advances in deep neural networks have substantially improved
the performance of this task. Most state-of-the-art approaches follow an
encoder-decoder framework, which generates captions using a sequential
recurrent prediction model. However, in this paper, we introduce a novel
decision-making framework for image captioning. We utilize a "policy network"
and a "value network" to collaboratively generate captions. The policy network
serves as a local guidance by providing the confidence of predicting the next
word according to the current state. Additionally, the value network serves as
a global and lookahead guidance by evaluating all possible extensions of the
current state. In essence, it adjusts the goal of predicting the correct words
towards the goal of generating captions similar to the ground truth captions.
We train both networks using an actor-critic reinforcement learning model, with
a novel reward defined by visual-semantic embedding. Extensive experiments and
analyses on the Microsoft COCO dataset show that the proposed framework
outperforms state-of-the-art approaches across different evaluation metrics
Paraphrase Generation with Deep Reinforcement Learning
Automatic generation of paraphrases from a given sentence is an important yet
challenging task in natural language processing (NLP), and plays a key role in
a number of applications such as question answering, search, and dialogue. In
this paper, we present a deep reinforcement learning approach to paraphrase
generation. Specifically, we propose a new framework for the task, which
consists of a \textit{generator} and an \textit{evaluator}, both of which are
learned from data. The generator, built as a sequence-to-sequence learning
model, can produce paraphrases given a sentence. The evaluator, constructed as
a deep matching model, can judge whether two sentences are paraphrases of each
other. The generator is first trained by deep learning and then further
fine-tuned by reinforcement learning in which the reward is given by the
evaluator. For the learning of the evaluator, we propose two methods based on
supervised learning and inverse reinforcement learning respectively, depending
on the type of available training data. Empirical study shows that the learned
evaluator can guide the generator to produce more accurate paraphrases.
Experimental results demonstrate the proposed models (the generators)
outperform the state-of-the-art methods in paraphrase generation in both
automatic evaluation and human evaluation.Comment: EMNLP 201
Putting the Horse Before the Cart:A Generator-Evaluator Framework for Question Generation from Text
Automatic question generation (QG) is a useful yet challenging task in NLP.
Recent neural network-based approaches represent the state-of-the-art in this
task. In this work, we attempt to strengthen them significantly by adopting a
holistic and novel generator-evaluator framework that directly optimizes
objectives that reward semantics and structure. The {\it generator} is a
sequence-to-sequence model that incorporates the {\it structure} and {\it
semantics} of the question being generated. The generator predicts an answer in
the passage that the question can pivot on. Employing the copy and coverage
mechanisms, it also acknowledges other contextually important (and possibly
rare) keywords in the passage that the question needs to conform to, while not
redundantly repeating words. The {\it evaluator} model evaluates and assigns a
reward to each predicted question based on its conformity to the {\it
structure} of ground-truth questions. We propose two novel QG-specific reward
functions for text conformity and answer conformity of the generated question.
The evaluator also employs structure-sensitive rewards based on evaluation
measures such as BLEU, GLEU, and ROUGE-L, which are suitable for QG. In
contrast, most of the previous works only optimize the cross-entropy loss,
which can induce inconsistencies between training (objective) and testing
(evaluation) measures. Our evaluation shows that our approach significantly
outperforms state-of-the-art systems on the widely-used SQuAD benchmark as per
both automatic and human evaluation.Comment: 10 pages, The SIGNLL Conference on Computational Natural Language
Learning (CoNLL 2019
Video Storytelling: Textual Summaries for Events
Bridging vision and natural language is a longstanding goal in computer
vision and multimedia research. While earlier works focus on generating a
single-sentence description for visual content, recent works have studied
paragraph generation. In this work, we introduce the problem of video
storytelling, which aims at generating coherent and succinct stories for long
videos. Video storytelling introduces new challenges, mainly due to the
diversity of the story and the length and complexity of the video. We propose
novel methods to address the challenges. First, we propose a context-aware
framework for multimodal embedding learning, where we design a Residual
Bidirectional Recurrent Neural Network to leverage contextual information from
past and future. Second, we propose a Narrator model to discover the underlying
storyline. The Narrator is formulated as a reinforcement learning agent which
is trained by directly optimizing the textual metric of the generated story. We
evaluate our method on the Video Story dataset, a new dataset that we have
collected to enable the study. We compare our method with multiple
state-of-the-art baselines, and show that our method achieves better
performance, in terms of quantitative measures and user study.Comment: Published in IEEE Transactions on Multimedi
- …