104,063 research outputs found
Memory Networks
We describe a new class of learning models called memory networks. Memory
networks reason with inference components combined with a long-term memory
component; they learn how to use these jointly. The long-term memory can be
read and written to, with the goal of using it for prediction. We investigate
these models in the context of question answering (QA) where the long-term
memory effectively acts as a (dynamic) knowledge base, and the output is a
textual response. We evaluate them on a large-scale QA task, and a smaller, but
more complex, toy task generated from a simulated world. In the latter, we show
the reasoning power of such models by chaining multiple supporting sentences to
answer questions that require understanding the intension of verbs
Visual Question Answering: A Survey of Methods and Datasets
Visual Question Answering (VQA) is a challenging task that has received
increasing attention from both the computer vision and the natural language
processing communities. Given an image and a question in natural language, it
requires reasoning over visual elements of the image and general knowledge to
infer the correct answer. In the first part of this survey, we examine the
state of the art by comparing modern approaches to the problem. We classify
methods by their mechanism to connect the visual and textual modalities. In
particular, we examine the common approach of combining convolutional and
recurrent neural networks to map images and questions to a common feature
space. We also discuss memory-augmented and modular architectures that
interface with structured knowledge bases. In the second part of this survey,
we review the datasets available for training and evaluating VQA systems. The
various datatsets contain questions at different levels of complexity, which
require different capabilities and types of reasoning. We examine in depth the
question/answer pairs from the Visual Genome project, and evaluate the
relevance of the structured annotations of images with scene graphs for VQA.
Finally, we discuss promising future directions for the field, in particular
the connection to structured knowledge bases and the use of natural language
processing models.Comment: 25 page
- …