Search CORE

574 research outputs found

Hierarchical LSTM with Adjusted Temporal Attention for Video Captioning

Author: Gao Lianli
Guo Zhao
Liu Wu
Shen Heng Tao
Song Jingkuan
Zhang Dongxiang
Publication venue
Publication date: 01/01/2017
Field of study

Recent progress has been made in using attention based encoder-decoder framework for video captioning. However, most existing decoders apply the attention mechanism to every generated word including both visual words (e.g., "gun" and "shooting") and non-visual words (e.g. "the", "a"). However, these non-visual words can be easily predicted using natural language model without considering visual signals or attention. Imposing attention mechanism on non-visual words could mislead and decrease the overall performance of video captioning. To address this issue, we propose a hierarchical LSTM with adjusted temporal attention (hLSTMat) approach for video captioning. Specifically, the proposed framework utilizes the temporal attention for selecting specific frames to predict the related words, while the adjusted temporal attention is for deciding whether to depend on the visual information or the language context information. Also, a hierarchical LSTMs is designed to simultaneously consider both low-level visual information and high-level language context information to support the video caption generation. To demonstrate the effectiveness of our proposed framework, we test our method on two prevalent datasets: MSVD and MSR-VTT, and experimental results show that our approach outperforms the state-of-the-art methods on both two datasets

arXiv.org e-Print Archive

Crossref

OPUS - University of Technology Sydney

Twin Networks: Matching the Future for Sequence Generation

Author: Bengio Yoshua
Ke Nan Rosemary
Pal Chris
Serdyuk Dmitriy
Sordoni Alessandro
Trischler Adam
Publication venue
Publication date: 01/01/2018
Field of study

We propose a simple technique for encouraging generative RNNs to plan ahead. We train a "backward" recurrent network to generate a given sequence in reverse order, and we encourage states of the forward model to predict cotemporal states of the backward model. The backward network is used only during training, and plays no role during sampling or inference. We hypothesize that our approach eases modeling of long-term dependencies by implicitly forcing the forward states to hold information about the longer-term future (as contained in the backward states). We show empirically that our approach achieves 9% relative improvement for a speech recognition task, and achieves significant improvement on a COCO caption generation task.Comment: 12 pages, 3 figures, published at ICLR 201

arXiv.org e-Print Archive

PolyPublie

Hierarchically Structured Reinforcement Learning for Topically Coherent Visual Story Generation

Author: Celikyilmaz Asli
Gan Zhe
He Xiaodong
Huang Qiuyuan
Wang Jianfeng
Wu Dapeng
Publication venue
Publication date: 18/01/2019
Field of study

We propose a hierarchically structured reinforcement learning approach to address the challenges of planning for generating coherent multi-sentence stories for the visual storytelling task. Within our framework, the task of generating a story given a sequence of images is divided across a two-level hierarchical decoder. The high-level decoder constructs a plan by generating a semantic concept (i.e., topic) for each image in sequence. The low-level decoder generates a sentence for each image using a semantic compositional network, which effectively grounds the sentence generation conditioned on the topic. The two decoders are jointly trained end-to-end using reinforcement learning. We evaluate our model on the visual storytelling (VIST) dataset. Empirical results from both automatic and human evaluations demonstrate that the proposed hierarchically structured reinforced training achieves significantly better performance compared to a strong flat deep reinforcement learning baseline.Comment: Accepted to AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications