37 research outputs found
Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning
The Visual Dialogue task requires an agent to engage in a conversation about
an image with a human. It represents an extension of the Visual Question
Answering task in that the agent needs to answer a question about an image, but
it needs to do so in light of the previous dialogue that has taken place. The
key challenge in Visual Dialogue is thus maintaining a consistent, and natural
dialogue while continuing to answer questions correctly. We present a novel
approach that combines Reinforcement Learning and Generative Adversarial
Networks (GANs) to generate more human-like responses to questions. The GAN
helps overcome the relative paucity of training data, and the tendency of the
typical MLE-based approach to generate overly terse answers. Critically, the
GAN is tightly integrated into the attention mechanism that generates
human-interpretable reasons for each answer. This means that the discriminative
model of the GAN has the task of assessing whether a candidate answer is
generated by a human or not, given the provided reason. This is significant
because it drives the generative model to produce high quality answers that are
well supported by the associated reasoning. The method also generates the
state-of-the-art results on the primary benchmark