1,784 research outputs found
Semantic bottleneck for computer vision tasks
This paper introduces a novel method for the representation of images that is
semantic by nature, addressing the question of computation intelligibility in
computer vision tasks. More specifically, our proposition is to introduce what
we call a semantic bottleneck in the processing pipeline, which is a crossing
point in which the representation of the image is entirely expressed with
natural language , while retaining the efficiency of numerical representations.
We show that our approach is able to generate semantic representations that
give state-of-the-art results on semantic content-based image retrieval and
also perform very well on image classification tasks. Intelligibility is
evaluated through user centered experiments for failure detection
Goal-Oriented Visual Question Generation via Intermediate Rewards
© 2018, Springer Nature Switzerland AG. Despite significant progress in a variety of vision-and-language problems, developing a method capable of asking intelligent, goal-oriented questions about images is proven to be an inscrutable challenge. Towards this end, we propose a Deep Reinforcement Learning framework based on three new intermediate rewards, namely goal-achieved, progressive and informativeness that encourage the generation of succinct questions, which in turn uncover valuable information towards the overall goal. By directly optimizing for questions that work quickly towards fulfilling the overall goal, we avoid the tendency of existing methods to generate long series of inane queries that add little value. We evaluate our model on the GuessWhat?! dataset and show that the resulting questions can help a standard ‘Guesser’ identify a specific object in an image at a much higher success rate
Visual Question Answering as Reading Comprehension
Visual question answering (VQA) demands simultaneous comprehension of both
the image visual content and natural language questions. In some cases, the
reasoning needs the help of common sense or general knowledge which usually
appear in the form of text. Current methods jointly embed both the visual
information and the textual feature into the same space. However, how to model
the complex interactions between the two different modalities is not an easy
task. In contrast to struggling on multimodal feature fusion, in this paper, we
propose to unify all the input information by natural language so as to convert
VQA into a machine reading comprehension problem. With this transformation, our
method not only can tackle VQA datasets that focus on observation based
questions, but can also be naturally extended to handle knowledge-based VQA
which requires to explore large-scale external knowledge base. It is a step
towards being able to exploit large volumes of text and natural language
processing techniques to address VQA problem. Two types of models are proposed
to deal with open-ended VQA and multiple-choice VQA respectively. We evaluate
our models on three VQA benchmarks. The comparable performance with the
state-of-the-art demonstrates the effectiveness of the proposed method
A Survey on Knowledge Graphs: Representation, Acquisition and Applications
Human knowledge provides a formal understanding of the world. Knowledge
graphs that represent structural relations between entities have become an
increasingly popular research direction towards cognition and human-level
intelligence. In this survey, we provide a comprehensive review of knowledge
graph covering overall research topics about 1) knowledge graph representation
learning, 2) knowledge acquisition and completion, 3) temporal knowledge graph,
and 4) knowledge-aware applications, and summarize recent breakthroughs and
perspective directions to facilitate future research. We propose a full-view
categorization and new taxonomies on these topics. Knowledge graph embedding is
organized from four aspects of representation space, scoring function, encoding
models, and auxiliary information. For knowledge acquisition, especially
knowledge graph completion, embedding methods, path inference, and logical rule
reasoning, are reviewed. We further explore several emerging topics, including
meta relational learning, commonsense reasoning, and temporal knowledge graphs.
To facilitate future research on knowledge graphs, we also provide a curated
collection of datasets and open-source libraries on different tasks. In the
end, we have a thorough outlook on several promising research directions
- …