237,877 research outputs found
Structure Learning for Neural Module Networks
Neural Module Networks, originally proposed for the task of visual question
answering, are a class of neural network architectures that involve
human-specified neural modules, each designed for a specific form of reasoning.
In current formulations of such networks only the parameters of the neural
modules and/or the order of their execution is learned. In this work, we
further expand this approach and also learn the underlying internal structure
of modules in terms of the ordering and combination of simple and elementary
arithmetic operators. Our results show that one is indeed able to
simultaneously learn both internal module structure and module sequencing
without extra supervisory signals for module execution sequencing. With this
approach, we report performance comparable to models using hand-designed
modules
Learning to Reason: End-to-End Module Networks for Visual Question Answering
Natural language questions are inherently compositional, and many are most
easily answered by reasoning about their decomposition into modular
sub-problems. For example, to answer "is there an equal number of balls and
boxes?" we can look for balls, look for boxes, count them, and compare the
results. The recently proposed Neural Module Network (NMN) architecture
implements this approach to question answering by parsing questions into
linguistic substructures and assembling question-specific deep networks from
smaller modules that each solve one subtask. However, existing NMN
implementations rely on brittle off-the-shelf parsers, and are restricted to
the module configurations proposed by these parsers rather than learning them
from data. In this paper, we propose End-to-End Module Networks (N2NMNs), which
learn to reason by directly predicting instance-specific network layouts
without the aid of a parser. Our model learns to generate network structures
(by imitating expert demonstrations) while simultaneously learning network
parameters (using the downstream task loss). Experimental results on the new
CLEVR dataset targeted at compositional question answering show that N2NMNs
achieve an error reduction of nearly 50% relative to state-of-the-art
attentional approaches, while discovering interpretable network architectures
specialized for each question
- …