Search CORE

6,557 research outputs found

Using Regular Languages to Explore the Representational Capacity of Recurrent Neural Architectures

Author: AS Reber
AW Smith
B Yoshua
G Jager
I Simon
J Rogers
JL Elman
M Casey
MP Marcus
N Chomsky
N Chomsky
S Hochreiter
WT Fitch
Y LeCun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

The presence of Long Distance Dependencies (LDDs) in sequential data poses significant challenges for computational models. Various recurrent neural architectures have been designed to mitigate this issue. In order to test these state-of-the-art architectures, there is growing need for rich benchmarking datasets. However, one of the drawbacks of existing datasets is the lack of experimental control with regards to the presence and/or degree of LDDs. This lack of control limits the analysis of model performance in relation to the specific challenge posed by LDDs. One way to address this is to use synthetic data having the properties of subregular languages. The degree of LDDs within the generated data can be controlled through the k parameter, length of the generated strings, and by choosing appropriate forbidden strings. In this paper, we explore the capacity of different RNN extensions to model LDDs, by evaluating these models on a sequence of SPk synthesized datasets, where each subsequent dataset exhibits a longer degree of LDD. Even though SPk are simple languages, the presence of LDDs does have significant impact on the performance of recurrent neural architectures, thus making them prime candidate in benchmarking tasks.Comment: International Conference of Artificial Neural Networks (ICANN) 201

arXiv.org e-Print Archive

Crossref

Arrow@TUDublin

Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering

Author: Agrawal Aishwarya
Batra Dhruv
Kembhavi Aniruddha
Parikh Devi
Publication venue
Publication date: 03/06/2018
Field of study

A number of studies have found that today's Visual Question Answering (VQA) models are heavily driven by superficial correlations in the training data and lack sufficient image grounding. To encourage development of models geared towards the latter, we propose a new setting for VQA where for every question type, train and test sets have different prior distributions of answers. Specifically, we present new splits of the VQA v1 and VQA v2 datasets, which we call Visual Question Answering under Changing Priors (VQA-CP v1 and VQA-CP v2 respectively). First, we evaluate several existing VQA models under this new setting and show that their performance degrades significantly compared to the original VQA setting. Second, we propose a novel Grounded Visual Question Answering model (GVQA) that contains inductive biases and restrictions in the architecture specifically designed to prevent the model from 'cheating' by primarily relying on priors in the training data. Specifically, GVQA explicitly disentangles the recognition of visual concepts present in the image from the identification of plausible answer space for a given question, enabling the model to more robustly generalize across different distributions of answers. GVQA is built off an existing VQA model -- Stacked Attention Networks (SAN). Our experiments demonstrate that GVQA significantly outperforms SAN on both VQA-CP v1 and VQA-CP v2 datasets. Interestingly, it also outperforms more powerful VQA models such as Multimodal Compact Bilinear Pooling (MCB) in several cases. GVQA offers strengths complementary to SAN when trained and evaluated on the original VQA v1 and VQA v2 datasets. Finally, GVQA is more transparent and interpretable than existing VQA models.Comment: 15 pages, 10 figures. To appear in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 201

arXiv.org e-Print Archive

Crossref