7,045 research outputs found
IRGAN: A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models
This paper provides a unified account of two schools of thinking in
information retrieval modelling: the generative retrieval focusing on
predicting relevant documents given a query, and the discriminative retrieval
focusing on predicting relevancy given a query-document pair. We propose a game
theoretical minimax game to iteratively optimise both models. On one hand, the
discriminative model, aiming to mine signals from labelled and unlabelled data,
provides guidance to train the generative model towards fitting the underlying
relevance distribution over documents given the query. On the other hand, the
generative model, acting as an attacker to the current discriminative model,
generates difficult examples for the discriminative model in an adversarial way
by minimising its discrimination objective. With the competition between these
two models, we show that the unified framework takes advantage of both schools
of thinking: (i) the generative model learns to fit the relevance distribution
over documents via the signals from the discriminative model, and (ii) the
discriminative model is able to exploit the unlabelled data selected by the
generative model to achieve a better estimation for document ranking. Our
experimental results have demonstrated significant performance gains as much as
23.96% on Precision@5 and 15.50% on MAP over strong baselines in a variety of
applications including web search, item recommendation, and question answering.Comment: 12 pages; appendix adde
Semantic bottleneck for computer vision tasks
This paper introduces a novel method for the representation of images that is
semantic by nature, addressing the question of computation intelligibility in
computer vision tasks. More specifically, our proposition is to introduce what
we call a semantic bottleneck in the processing pipeline, which is a crossing
point in which the representation of the image is entirely expressed with
natural language , while retaining the efficiency of numerical representations.
We show that our approach is able to generate semantic representations that
give state-of-the-art results on semantic content-based image retrieval and
also perform very well on image classification tasks. Intelligibility is
evaluated through user centered experiments for failure detection
Grounding semantics in robots for Visual Question Answering
In this thesis I describe an operational implementation of an object detection and description system that incorporates in an end-to-end Visual Question Answering system and evaluated it on two visual question answering datasets for compositional language and elementary visual reasoning
Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning
The Visual Dialogue task requires an agent to engage in a conversation about
an image with a human. It represents an extension of the Visual Question
Answering task in that the agent needs to answer a question about an image, but
it needs to do so in light of the previous dialogue that has taken place. The
key challenge in Visual Dialogue is thus maintaining a consistent, and natural
dialogue while continuing to answer questions correctly. We present a novel
approach that combines Reinforcement Learning and Generative Adversarial
Networks (GANs) to generate more human-like responses to questions. The GAN
helps overcome the relative paucity of training data, and the tendency of the
typical MLE-based approach to generate overly terse answers. Critically, the
GAN is tightly integrated into the attention mechanism that generates
human-interpretable reasons for each answer. This means that the discriminative
model of the GAN has the task of assessing whether a candidate answer is
generated by a human or not, given the provided reason. This is significant
because it drives the generative model to produce high quality answers that are
well supported by the associated reasoning. The method also generates the
state-of-the-art results on the primary benchmark
- …