6,557 research outputs found
Using Regular Languages to Explore the Representational Capacity of Recurrent Neural Architectures
The presence of Long Distance Dependencies (LDDs) in sequential data poses
significant challenges for computational models. Various recurrent neural
architectures have been designed to mitigate this issue. In order to test these
state-of-the-art architectures, there is growing need for rich benchmarking
datasets. However, one of the drawbacks of existing datasets is the lack of
experimental control with regards to the presence and/or degree of LDDs. This
lack of control limits the analysis of model performance in relation to the
specific challenge posed by LDDs. One way to address this is to use synthetic
data having the properties of subregular languages. The degree of LDDs within
the generated data can be controlled through the k parameter, length of the
generated strings, and by choosing appropriate forbidden strings. In this
paper, we explore the capacity of different RNN extensions to model LDDs, by
evaluating these models on a sequence of SPk synthesized datasets, where each
subsequent dataset exhibits a longer degree of LDD. Even though SPk are simple
languages, the presence of LDDs does have significant impact on the performance
of recurrent neural architectures, thus making them prime candidate in
benchmarking tasks.Comment: International Conference of Artificial Neural Networks (ICANN) 201
Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
A number of studies have found that today's Visual Question Answering (VQA)
models are heavily driven by superficial correlations in the training data and
lack sufficient image grounding. To encourage development of models geared
towards the latter, we propose a new setting for VQA where for every question
type, train and test sets have different prior distributions of answers.
Specifically, we present new splits of the VQA v1 and VQA v2 datasets, which we
call Visual Question Answering under Changing Priors (VQA-CP v1 and VQA-CP v2
respectively). First, we evaluate several existing VQA models under this new
setting and show that their performance degrades significantly compared to the
original VQA setting. Second, we propose a novel Grounded Visual Question
Answering model (GVQA) that contains inductive biases and restrictions in the
architecture specifically designed to prevent the model from 'cheating' by
primarily relying on priors in the training data. Specifically, GVQA explicitly
disentangles the recognition of visual concepts present in the image from the
identification of plausible answer space for a given question, enabling the
model to more robustly generalize across different distributions of answers.
GVQA is built off an existing VQA model -- Stacked Attention Networks (SAN).
Our experiments demonstrate that GVQA significantly outperforms SAN on both
VQA-CP v1 and VQA-CP v2 datasets. Interestingly, it also outperforms more
powerful VQA models such as Multimodal Compact Bilinear Pooling (MCB) in
several cases. GVQA offers strengths complementary to SAN when trained and
evaluated on the original VQA v1 and VQA v2 datasets. Finally, GVQA is more
transparent and interpretable than existing VQA models.Comment: 15 pages, 10 figures. To appear in IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), 201
- …