15,809 research outputs found
Compositional Attention Networks for Machine Reasoning
We present the MAC network, a novel fully differentiable neural network
architecture, designed to facilitate explicit and expressive reasoning. MAC
moves away from monolithic black-box neural architectures towards a design that
encourages both transparency and versatility. The model approaches problems by
decomposing them into a series of attention-based reasoning steps, each
performed by a novel recurrent Memory, Attention, and Composition (MAC) cell
that maintains a separation between control and memory. By stringing the cells
together and imposing structural constraints that regulate their interaction,
MAC effectively learns to perform iterative reasoning processes that are
directly inferred from the data in an end-to-end approach. We demonstrate the
model's strength, robustness and interpretability on the challenging CLEVR
dataset for visual reasoning, achieving a new state-of-the-art 98.9% accuracy,
halving the error rate of the previous best model. More importantly, we show
that the model is computationally-efficient and data-efficient, in particular
requiring 5x less data than existing models to achieve strong results.Comment: Published as a conference paper at ICLR 201
Compositional Attention Networks for Interpretability in Natural Language Question Answering
MAC Net is a compositional attention network designed for Visual Question
Answering. We propose a modified MAC net architecture for Natural Language
Question Answering. Question Answering typically requires Language
Understanding and multi-step Reasoning. MAC net's unique architecture - the
separation between memory and control, facilitates data-driven iterative
reasoning. This makes it an ideal candidate for solving tasks that involve
logical reasoning. Our experiments with 20 bAbI tasks demonstrate the value of
MAC net as a data-efficient and interpretable architecture for Natural Language
Question Answering. The transparent nature of MAC net provides a highly
granular view of the reasoning steps taken by the network in answering a query.Comment: 8 pages,10 figures, 1 tabl
Compositional generalization in a deep seq2seq model by separating syntax and semantics
Standard methods in deep learning for natural language processing fail to
capture the compositional structure of human language that allows for
systematic generalization outside of the training distribution. However, human
learners readily generalize in this way, e.g. by applying known grammatical
rules to novel words. Inspired by work in neuroscience suggesting separate
brain systems for syntactic and semantic processing, we implement a
modification to standard approaches in neural machine translation, imposing an
analogous separation. The novel model, which we call Syntactic Attention,
substantially outperforms standard methods in deep learning on the SCAN
dataset, a compositional generalization task, without any hand-engineered
features or additional supervision. Our work suggests that separating syntactic
from semantic learning may be a useful heuristic for capturing compositional
structure.Comment: 18 pages, 15 figures, preprint version of submission to NeurIPS 2019,
under revie
Challenges and Prospects in Vision and Language Research
Language grounded image understanding tasks have often been proposed as a
method for evaluating progress in artificial intelligence. Ideally, these tasks
should test a plethora of capabilities that integrate computer vision,
reasoning, and natural language understanding. However, rather than behaving as
visual Turing tests, recent studies have demonstrated state-of-the-art systems
are achieving good performance through flaws in datasets and evaluation
procedures. We review the current state of affairs and outline a path forward
A New Framework for Machine Intelligence: Concepts and Prototype
Machine learning (ML) and artificial intelligence (AI) have become hot topics
in many information processing areas, from chatbots to scientific data
analysis. At the same time, there is uncertainty about the possibility of
extending predominant ML technologies to become general solutions with
continuous learning capabilities. Here, a simple, yet comprehensive,
theoretical framework for intelligent systems is presented. A combination of
Mirror Compositional Representations (MCR) and a Solution-Critic Loop (SCL) is
proposed as a generic approach for different types of problems. A prototype
implementation is presented for document comparison using English Wikipedia
corpus
Visual Reasoning by Progressive Module Networks
Humans learn to solve tasks of increasing complexity by building on top of
previously acquired knowledge. Typically, there exists a natural progression in
the tasks that we learn - most do not require completely independent solutions,
but can be broken down into simpler subtasks. We propose to represent a solver
for each task as a neural module that calls existing modules (solvers for
simpler tasks) in a functional program-like manner. Lower modules are a black
box to the calling module, and communicate only via a query and an output.
Thus, a module for a new task learns to query existing modules and composes
their outputs in order to produce its own output. Our model effectively
combines previous skill-sets, does not suffer from forgetting, and is fully
differentiable. We test our model in learning a set of visual reasoning tasks,
and demonstrate improved performances in all tasks by learning progressively.
By evaluating the reasoning process using human judges, we show that our model
is more interpretable than an attention-based baseline.Comment: 17 pages, 5 figure
The relational processing limits of classic and contemporary neural network models of language processing
The ability of neural networks to capture relational knowledge is a matter of
long-standing controversy. Recently, some researchers in the PDP side of the
debate have argued that (1) classic PDP models can handle relational structure
(Rogers & McClelland, 2008, 2014) and (2) the success of deep learning
approaches to text processing suggests that structured representations are
unnecessary to capture the gist of human language (Rabovsky et al., 2018). In
the present study we tested the Story Gestalt model (St. John, 1992), a classic
PDP model of text comprehension, and a Sequence-to-Sequence with Attention
model (Bahdanau et al., 2015), a contemporary deep learning architecture for
text processing. Both models were trained to answer questions about stories
based on the thematic roles that several concepts played on the stories. In
three critical test we varied the statistical structure of new stories while
keeping their relational structure constant with respect to the training data.
Each model was susceptible to each statistical structure manipulation to a
different degree, with their performance failing below chance at least under
one manipulation. We argue that the failures of both models are due to the fact
that they cannotperform dynamic binding of independent roles and fillers.
Ultimately, these results cast doubts onthe suitability of traditional neural
networks models for explaining phenomena based on relational reasoning,
including language processing
Explainable Neural Computation via Stack Neural Module Networks
In complex inferential tasks like question answering, machine learning models
must confront two challenges: the need to implement a compositional reasoning
process, and, in many applications, the need for this reasoning process to be
interpretable to assist users in both development and prediction. Existing
models designed to produce interpretable traces of their decision-making
process typically require these traces to be supervised at training time. In
this paper, we present a novel neural modular approach that performs
compositional reasoning by automatically inducing a desired sub-task
decomposition without relying on strong supervision. Our model allows linking
different reasoning tasks though shared modules that handle common routines
across tasks. Experiments show that the model is more interpretable to human
evaluators compared to other state-of-the-art models: users can better
understand the model's underlying reasoning procedure and predict when it will
succeed or fail based on observing its intermediate outputs.Comment: ECCV 201
A Dataset and Architecture for Visual Reasoning with a Working Memory
A vexing problem in artificial intelligence is reasoning about events that
occur in complex, changing visual stimuli such as in video analysis or game
play. Inspired by a rich tradition of visual reasoning and memory in cognitive
psychology and neuroscience, we developed an artificial, configurable visual
question and answer dataset (COG) to parallel experiments in humans and
animals. COG is much simpler than the general problem of video analysis, yet it
addresses many of the problems relating to visual and logical reasoning and
memory -- problems that remain challenging for modern deep learning
architectures. We additionally propose a deep learning architecture that
performs competitively on other diagnostic VQA datasets (i.e. CLEVR) as well as
easy settings of the COG dataset. However, several settings of COG result in
datasets that are progressively more challenging to learn. After training, the
network can zero-shot generalize to many new tasks. Preliminary analyses of the
network architectures trained on COG demonstrate that the network accomplishes
the task in a manner interpretable to humans
Probabilistic Neural-symbolic Models for Interpretable Visual Question Answering
We propose a new class of probabilistic neural-symbolic models, that have
symbolic functional programs as a latent, stochastic variable. Instantiated in
the context of visual question answering, our probabilistic formulation offers
two key conceptual advantages over prior neural-symbolic models for VQA.
Firstly, the programs generated by our model are more understandable while
requiring lesser number of teaching examples. Secondly, we show that one can
pose counterfactual scenarios to the model, to probe its beliefs on the
programs that could lead to a specified answer given an image. Our results on
the CLEVR and SHAPES datasets verify our hypotheses, showing that the model
gets better program (and answer) prediction accuracy even in the low data
regime, and allows one to probe the coherence and consistency of reasoning
performed.Comment: ICML 2019 Camera Ready + Appendi
- …