45,884 research outputs found
A Survey of Current Datasets for Vision and Language Research
Integrating vision and language has long been a dream in work on artificial
intelligence (AI). In the past two years, we have witnessed an explosion of
work that brings together vision and language from images to videos and beyond.
The available corpora have played a crucial role in advancing this area of
research. In this paper, we propose a set of quality metrics for evaluating and
analyzing the vision & language datasets and categorize them accordingly. Our
analyses show that the most recent datasets have been using more complex
language and more abstract concepts, however, there are different strengths and
weaknesses in each.Comment: To appear in EMNLP 2015, short proceedings. Dataset analysis and
discussion expanded, including an initial examination into reporting bias for
one of them. F.F. and N.M. contributed equally to this wor
Object Referring in Videos with Language and Human Gaze
We investigate the problem of object referring (OR) i.e. to localize a target
object in a visual scene coming with a language description. Humans perceive
the world more as continued video snippets than as static images, and describe
objects not only by their appearance, but also by their spatio-temporal context
and motion features. Humans also gaze at the object when they issue a referring
expression. Existing works for OR mostly focus on static images only, which
fall short in providing many such cues. This paper addresses OR in videos with
language and human gaze. To that end, we present a new video dataset for OR,
with 30, 000 objects over 5, 000 stereo video sequences annotated for their
descriptions and gaze. We further propose a novel network model for OR in
videos, by integrating appearance, motion, gaze, and spatio-temporal context
into one network. Experimental results show that our method effectively
utilizes motion cues, human gaze, and spatio-temporal context. Our method
outperforms previousOR methods. For dataset and code, please refer
https://people.ee.ethz.ch/~arunv/ORGaze.html.Comment: Accepted to CVPR 2018, 10 pages, 6 figure
Recommended from our members
Teaching archive skills: a pedagogical journey with impact
This article considers the pedagogical practice of the Special Collections staff team at the University of Sussex and the impact of the group visit experience on student learning. It addresses our current group visit teaching offer to students at the University of Sussex and our move to a more student-led active learning approach. It considers the use of the ‘pedagogical toolkit’ including technology within the classroom, and the creation of a document identification form to encourage critical thinking. Our aim for any group visit is to provide a positive first experience and get students enthused about using archives. In 2017 we undertook our own impact study, detailed within the article, to follow the student journey with the intention of finding out if students returned to use archives for their studies as a result of their group visit. Moving forward, this article considers our future activities in response to the impact study and institutional initiatives
Learning interaction patterns using diagrams varying in level and type of interactivity
An experiment was conducted to investigate the differences between learners when using computer based learning environments (CBLEs) that incorporated different levels of interactivity in diagrams. Four CBLEs were created with combinations of the following two interactivity properties: (a) the possibility to rotate the whole diagram (b) the possibility to move individual elements of the diagram in order to apprehend the relationships between them. We present and discuss the qualitative findings from the study in terms of the learners’ interaction patterns and their relevance for the understanding of performance scores. This supports our previous quantitative analysis showing an interaction between cognitive abilities and interactivity. Based on our findings we reflect on the possibilities to inform CBLEs with relevant information regarding learners’ cognitive abilities and representational preferences
- …