Search CORE

845 research outputs found

Oracle Performance for Visual Captioning

Author
Publication venue: 'British Machine Vision Association and Society for Pattern Recognition'
Publication date: 01/01/2016
Field of study

Semantic bottleneck for computer vision tasks

Author: Gabriëlle Ras
LA Hendricks
MD Zeiler
T-Y Lin
X Lin
Publication venue
Publication date: 06/11/2018
Field of study

This paper introduces a novel method for the representation of images that is semantic by nature, addressing the question of computation intelligibility in computer vision tasks. More specifically, our proposition is to introduce what we call a semantic bottleneck in the processing pipeline, which is a crossing point in which the representation of the image is entirely expressed with natural language , while retaining the efficiency of numerical representations. We show that our approach is able to generate semantic representations that give state-of-the-art results on semantic content-based image retrieval and also perform very well on image classification tasks. Intelligibility is evaluated through user centered experiments for failure detection

arXiv.org e-Print Archive

HAL - Normandie Université

Crossref

GuessWhat?! Visual object discovery through multi-modal dialogue

Author: Chandar Sarath
Courville Aaron
de Vries Harm
Larochelle Hugo
Pietquin Olivier
Strub Florian
Publication venue
Publication date: 01/01/2017
Field of study

We introduce GuessWhat?!, a two-player guessing game as a testbed for research on the interplay of computer vision and dialogue systems. The goal of the game is to locate an unknown object in a rich image scene by asking a sequence of questions. Higher-level image understanding, like spatial reasoning and language grounding, is required to solve the proposed task. Our key contribution is the collection of a large-scale dataset consisting of 150K human-played games with a total of 800K visual question-answer pairs on 66K images. We explain our design decisions in collecting the dataset and introduce the oracle and questioner tasks that are associated with the two players of the game. We prototyped deep learning models to establish initial baselines of the introduced tasks.Comment: 23 pages; CVPR 2017 submission; see https://guesswhat.a

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

PolyPublie

Hal-Diderot