Search CORE

45,884 research outputs found

A Survey of Current Datasets for Vision and Language Research

Author: Devlin Jacob
Ferraro Francis
Galley Michel
Huang
Mitchell Margaret
Mostafazadeh Nasrin
Ting-Hao
Vanderwende Lucy
Publication venue
Publication date: 01/01/2015
Field of study

Integrating vision and language has long been a dream in work on artificial intelligence (AI). In the past two years, we have witnessed an explosion of work that brings together vision and language from images to videos and beyond. The available corpora have played a crucial role in advancing this area of research. In this paper, we propose a set of quality metrics for evaluating and analyzing the vision & language datasets and categorize them accordingly. Our analyses show that the most recent datasets have been using more complex language and more abstract concepts, however, there are different strengths and weaknesses in each.Comment: To appear in EMNLP 2015, short proceedings. Dataset analysis and discussion expanded, including an initial examination into reporting bias for one of them. F.F. and N.M. contributed equally to this wor

arXiv.org e-Print Archive

Crossref

Object Referring in Videos with Language and Human Gaze

Author: Dai Dengxin
Van Gool Luc
Vasudevan Arun Balajee
Publication venue
Publication date: 04/04/2018
Field of study

We investigate the problem of object referring (OR) i.e. to localize a target object in a visual scene coming with a language description. Humans perceive the world more as continued video snippets than as static images, and describe objects not only by their appearance, but also by their spatio-temporal context and motion features. Humans also gaze at the object when they issue a referring expression. Existing works for OR mostly focus on static images only, which fall short in providing many such cues. This paper addresses OR in videos with language and human gaze. To that end, we present a new video dataset for OR, with 30, 000 objects over 5, 000 stereo video sequences annotated for their descriptions and gaze. We further propose a novel network model for OR in videos, by integrating appearance, motion, gaze, and spatio-temporal context into one network. Experimental results show that our method effectively utilizes motion cues, human gaze, and spatio-temporal context. Our method outperforms previousOR methods. For dataset and code, please refer https://people.ee.ethz.ch/~arunv/ORGaze.html.Comment: Accepted to CVPR 2018, 10 pages, 6 figure

arXiv.org e-Print Archive

Repository for Publications and Research Data

Crossref

Recommended from our members

Teaching archive skills: a pedagogical journey with impact

Author: Pattrick Kirsty
Watson Karen
Publication venue: 'Informa UK Limited'
Publication date: 17/03/2020
Field of study

This article considers the pedagogical practice of the Special Collections staff team at the University of Sussex and the impact of the group visit experience on student learning. It addresses our current group visit teaching offer to students at the University of Sussex and our move to a more student-led active learning approach. It considers the use of the ‘pedagogical toolkit’ including technology within the classroom, and the creation of a document identification form to encourage critical thinking. Our aim for any group visit is to provide a positive first experience and get students enthused about using archives. In 2017 we undertook our own impact study, detailed within the article, to follow the student journey with the intention of finding out if students returned to use archives for their studies as a result of their group visit. Moving forward, this article considers our future activities in response to the impact study and institutional initiatives

Sussex Research Online

Learning interaction patterns using diagrams varying in level and type of interactivity

Author: du Boulay Benedict
Otero Nuno
Rogers Yvonne
Publication venue: American Association for Artificial Intelligence
Publication date: 01/01/2005
Field of study

An experiment was conducted to investigate the differences between learners when using computer based learning environments (CBLEs) that incorporated different levels of interactivity in diagrams. Four CBLEs were created with combinations of the following two interactivity properties: (a) the possibility to rotate the whole diagram (b) the possibility to move individual elements of the diagram in order to apprehend the relationships between them. We present and discuss the qualitative findings from the study in terms of the learners’ interaction patterns and their relevance for the understanding of performance scores. This supports our previous quantitative analysis showing an interaction between cognitive abilities and interactivity. Based on our findings we reflect on the possibilities to inform CBLEs with relevant information regarding learners’ cognitive abilities and representational preferences

UCL Discovery

Sussex Research Online