144 research outputs found
Reinforced Video Captioning with Entailment Rewards
Sequence-to-sequence models have shown promising improvements on the temporal
task of video captioning, but they optimize word-level cross-entropy loss
during training. First, using policy gradient and mixed-loss methods for
reinforcement learning, we directly optimize sentence-level task-based metrics
(as rewards), achieving significant improvements over the baseline, based on
both automatic metrics and human evaluation on multiple datasets. Next, we
propose a novel entailment-enhanced reward (CIDEnt) that corrects
phrase-matching based metrics (such as CIDEr) to only allow for
logically-implied partial matches and avoid contradictions, achieving further
significant improvements over the CIDEr-reward model. Overall, our
CIDEnt-reward model achieves the new state-of-the-art on the MSR-VTT dataset.Comment: EMNLP 2017 (9 pages
- …