1 research outputs found
Visual Dialogue without Vision or Dialogue
We characterise some of the quirks and shortcomings in the exploration of
Visual Dialogue - a sequential question-answering task where the questions and
corresponding answers are related through given visual stimuli. To do so, we
develop an embarrassingly simple method based on Canonical Correlation Analysis
(CCA) that, on the standard dataset, achieves near state-of-the-art performance
on mean rank (MR). In direct contrast to current complex and over-parametrised
architectures that are both compute and time intensive, our method ignores the
visual stimuli, ignores the sequencing of dialogue, does not need gradients,
uses off-the-shelf feature extractors, has at least an order of magnitude fewer
parameters, and learns in practically no time. We argue that these results are
indicative of issues in current approaches to Visual Dialogue and conduct
analyses to highlight implicit dataset biases and effects of over-constrained
evaluation metrics. Our code is publicly available.Comment: 2018 NeurIPS Workshop on Critiquing and Correcting Trends in Machine
Learnin