23,347 research outputs found
Structure Learning for Neural Module Networks
Neural Module Networks, originally proposed for the task of visual question
answering, are a class of neural network architectures that involve
human-specified neural modules, each designed for a specific form of reasoning.
In current formulations of such networks only the parameters of the neural
modules and/or the order of their execution is learned. In this work, we
further expand this approach and also learn the underlying internal structure
of modules in terms of the ordering and combination of simple and elementary
arithmetic operators. Our results show that one is indeed able to
simultaneously learn both internal module structure and module sequencing
without extra supervisory signals for module execution sequencing. With this
approach, we report performance comparable to models using hand-designed
modules
Training neural networks to encode symbols enables combinatorial generalization
Combinatorial generalization - the ability to understand and produce novel
combinations of already familiar elements - is considered to be a core capacity
of the human mind and a major challenge to neural network models. A significant
body of research suggests that conventional neural networks can't solve this
problem unless they are endowed with mechanisms specifically engineered for the
purpose of representing symbols. In this paper we introduce a novel way of
representing symbolic structures in connectionist terms - the vectors approach
to representing symbols (VARS), which allows training standard neural
architectures to encode symbolic knowledge explicitly at their output layers.
In two simulations, we show that neural networks not only can learn to produce
VARS representations, but in doing so they achieve combinatorial generalization
in their symbolic and non-symbolic output. This adds to other recent work that
has shown improved combinatorial generalization under specific training
conditions, and raises the question of whether specific mechanisms or training
routines are needed to support symbolic processing
Revisiting Visual Question Answering Baselines
Visual question answering (VQA) is an interesting learning setting for
evaluating the abilities and shortcomings of current systems for image
understanding. Many of the recently proposed VQA systems include attention or
memory mechanisms designed to support "reasoning". For multiple-choice VQA,
nearly all of these systems train a multi-class classifier on image and
question features to predict an answer. This paper questions the value of these
common practices and develops a simple alternative model based on binary
classification. Instead of treating answers as competing choices, our model
receives the answer as input and predicts whether or not an
image-question-answer triplet is correct. We evaluate our model on the Visual7W
Telling and the VQA Real Multiple Choice tasks, and find that even simple
versions of our model perform competitively. Our best model achieves
state-of-the-art performance on the Visual7W Telling task and compares
surprisingly well with the most complex systems proposed for the VQA Real
Multiple Choice task. We explore variants of the model and study its
transferability between both datasets. We also present an error analysis of our
model that suggests a key problem of current VQA systems lies in the lack of
visual grounding of concepts that occur in the questions and answers. Overall,
our results suggest that the performance of current VQA systems is not
significantly better than that of systems designed to exploit dataset biases.Comment: European Conference on Computer Visio
- …