Search CORE

69,461 research outputs found

Creativity: Generating Diverse Questions using Variational Autoencoders

Author: Jain Unnat
Schwing Alexander
Zhang Ziyu
Publication venue
Publication date: 11/04/2017
Field of study

Generating diverse questions for given images is an important task for computational education, entertainment and AI assistants. Different from many conventional prediction techniques is the need for algorithms to generate a diverse set of plausible questions, which we refer to as "creativity". In this paper we propose a creative algorithm for visual question generation which combines the advantages of variational autoencoders with long short-term memory networks. We demonstrate that our framework is able to generate a large set of varying questions given a single input image.Comment: Accepted to CVPR 201

arXiv.org e-Print Archive

Crossref

Dual Attention Networks for Visual Reference Resolution in Visual Dialog

Author: Kang Gi-Cheon
Lim Jaeseo
Zhang Byoung-Tak
Publication venue
Publication date: 01/01/2019
Field of study

Visual dialog (VisDial) is a task which requires an AI agent to answer a series of questions grounded in an image. Unlike in visual question answering (VQA), the series of questions should be able to capture a temporal context from a dialog history and exploit visually-grounded information. A problem called visual reference resolution involves these challenges, requiring the agent to resolve ambiguous references in a given question and find the references in a given image. In this paper, we propose Dual Attention Networks (DAN) for visual reference resolution. DAN consists of two kinds of attention networks, REFER and FIND. Specifically, REFER module learns latent relationships between a given question and a dialog history by employing a self-attention mechanism. FIND module takes image features and reference-aware representations (i.e., the output of REFER module) as input, and performs visual grounding via bottom-up attention mechanism. We qualitatively and quantitatively evaluate our model on VisDial v1.0 and v0.9 datasets, showing that DAN outperforms the previous state-of-the-art model by a significant margin.Comment: EMNLP 201

arXiv.org e-Print Archive

Crossref

SNU Open Repository and Archive