Search CORE

1,532 research outputs found

Hierarchically Structured Reinforcement Learning for Topically Coherent Visual Story Generation

Author: Celikyilmaz Asli
Gan Zhe
He Xiaodong
Huang Qiuyuan
Wang Jianfeng
Wu Dapeng
Publication venue
Publication date: 18/01/2019
Field of study

We propose a hierarchically structured reinforcement learning approach to address the challenges of planning for generating coherent multi-sentence stories for the visual storytelling task. Within our framework, the task of generating a story given a sequence of images is divided across a two-level hierarchical decoder. The high-level decoder constructs a plan by generating a semantic concept (i.e., topic) for each image in sequence. The low-level decoder generates a sentence for each image using a semantic compositional network, which effectively grounds the sentence generation conditioned on the topic. The two decoders are jointly trained end-to-end using reinforcement learning. We evaluate our model on the visual storytelling (VIST) dataset. Empirical results from both automatic and human evaluations demonstrate that the proposed hierarchically structured reinforced training achieves significantly better performance compared to a strong flat deep reinforcement learning baseline.Comment: Accepted to AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Fooling Vision and Language Models Despite Localization and Attention Mechanism

Author: Xu Xiaojun
Chen Xinyun
Liu Chang
Rohrbach Anna
Darrell Trevor
Song Dawn
Publication venue
Publication date: 01/01/2017
Field of study

Adversarial attacks are known to succeed on classifiers, but it has been an open question whether more complex vision systems are vulnerable. In this paper, we study adversarial examples for vision and language models, which incorporate natural language understanding and complex structures such as attention, localization, and modular architectures. In particular, we investigate attacks on a dense captioning model and on two visual question answering (VQA) models. Our evaluation shows that we can generate adversarial examples with a high success rate (i.e., > 90%) for these models. Our work sheds new light on understanding adversarial attacks on vision systems which have a language component and shows that attention, bounding box localization, and compositional internal structures are vulnerable to adversarial attacks. These observations will inform future work towards building effective defenses.Comment: CVPR 201

arXiv.org e-Print Archive

Directory of Open Access Journals

Copenhagen University Research Information System

Munin - Open Research Archive

NORA - Norwegian Open Research Archives

Fooling Vision and Language Models Despite Localization and Attention Mechanism

Author: Chen Xinyun
Darrell Trevor
Liu Chang
Rohrbach Anna
Song Dawn
Xu Xiaojun
Publication venue
Publication date: 01/01/2018
Field of study

arXiv.org e-Print Archive

Crossref

MPG.PuRe