Search CORE

134 research outputs found

Hard to Cheat: A Turing Test based on Answering Questions about Images

Author: Fritz Mario
Malinowski Mateusz
Publication venue
Publication date: 01/01/2015
Field of study

Progress in language and image understanding by machines has sparkled the interest of the research community in more open-ended, holistic tasks, and refueled an old AI dream of building intelligent machines. We discuss a few prominent challenges that characterize such holistic tasks and argue for "question answering about images" as a particular appealing instance of such a holistic task. In particular, we point out that it is a version of a Turing Test that is likely to be more robust to over-interpretations and contrast it with tasks like grounding and generation of descriptions. Finally, we discuss tools to measure progress in this field.Comment: Presented in AAAI-15 Workshop: Beyond the Turing Tes

arXiv.org e-Print Archive

CISPA – Helmholtz-Zentrum für Informationssicherheit

MPG.PuRe

Learning Visual Reasoning Without Strong Priors

Author: Courville Aaron
de Vries Harm
Dumoulin Vincent
Perez Ethan
Strub Florian
Publication venue
Publication date: 11/08/2017
Field of study

Achieving artificial visual reasoning - the ability to answer image-related questions which require a multi-step, high-level process - is an important step towards artificial general intelligence. This multi-modal task requires learning a question-dependent, structured reasoning process over images from language. Standard deep learning approaches tend to exploit biases in the data rather than learn this underlying structure, while leading methods learn to visually reason successfully but are hand-crafted for reasoning. We show that a general-purpose, Conditional Batch Normalization approach achieves state-of-the-art results on the CLEVR Visual Reasoning benchmark with a 2.4% error rate. We outperform the next best end-to-end method (4.5%) and even methods that use extra supervision (3.1%). We probe our model to shed light on how it reasons, showing it has learned a question-dependent, multi-step process. Previous work has operated under the assumption that visual reasoning calls for a specialized architecture, but we show that a general architecture with proper conditioning can learn to visually reason effectively.Comment: Full AAAI 2018 paper is at arXiv:1709.07871. Presented at ICML 2017's Machine Learning in Speech and Language Processing Workshop. Code is at http://github.com/ethanjperez/fil

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

A Survey of Current Datasets for Vision and Language Research

Author: Devlin Jacob
Ferraro Francis
Galley Michel
Huang
Mitchell Margaret
Mostafazadeh Nasrin
Ting-Hao
Vanderwende Lucy
Publication venue
Publication date: 01/01/2015
Field of study

Integrating vision and language has long been a dream in work on artificial intelligence (AI). In the past two years, we have witnessed an explosion of work that brings together vision and language from images to videos and beyond. The available corpora have played a crucial role in advancing this area of research. In this paper, we propose a set of quality metrics for evaluating and analyzing the vision & language datasets and categorize them accordingly. Our analyses show that the most recent datasets have been using more complex language and more abstract concepts, however, there are different strengths and weaknesses in each.Comment: To appear in EMNLP 2015, short proceedings. Dataset analysis and discussion expanded, including an initial examination into reporting bias for one of them. F.F. and N.M. contributed equally to this wor

arXiv.org e-Print Archive

Crossref