45,884 research outputs found

    A Survey of Current Datasets for Vision and Language Research

    Full text link
    Integrating vision and language has long been a dream in work on artificial intelligence (AI). In the past two years, we have witnessed an explosion of work that brings together vision and language from images to videos and beyond. The available corpora have played a crucial role in advancing this area of research. In this paper, we propose a set of quality metrics for evaluating and analyzing the vision & language datasets and categorize them accordingly. Our analyses show that the most recent datasets have been using more complex language and more abstract concepts, however, there are different strengths and weaknesses in each.Comment: To appear in EMNLP 2015, short proceedings. Dataset analysis and discussion expanded, including an initial examination into reporting bias for one of them. F.F. and N.M. contributed equally to this wor

    Object Referring in Videos with Language and Human Gaze

    Full text link
    We investigate the problem of object referring (OR) i.e. to localize a target object in a visual scene coming with a language description. Humans perceive the world more as continued video snippets than as static images, and describe objects not only by their appearance, but also by their spatio-temporal context and motion features. Humans also gaze at the object when they issue a referring expression. Existing works for OR mostly focus on static images only, which fall short in providing many such cues. This paper addresses OR in videos with language and human gaze. To that end, we present a new video dataset for OR, with 30, 000 objects over 5, 000 stereo video sequences annotated for their descriptions and gaze. We further propose a novel network model for OR in videos, by integrating appearance, motion, gaze, and spatio-temporal context into one network. Experimental results show that our method effectively utilizes motion cues, human gaze, and spatio-temporal context. Our method outperforms previousOR methods. For dataset and code, please refer https://people.ee.ethz.ch/~arunv/ORGaze.html.Comment: Accepted to CVPR 2018, 10 pages, 6 figure

    Learning interaction patterns using diagrams varying in level and type of interactivity

    Get PDF
    An experiment was conducted to investigate the differences between learners when using computer based learning environments (CBLEs) that incorporated different levels of interactivity in diagrams. Four CBLEs were created with combinations of the following two interactivity properties: (a) the possibility to rotate the whole diagram (b) the possibility to move individual elements of the diagram in order to apprehend the relationships between them. We present and discuss the qualitative findings from the study in terms of the learners’ interaction patterns and their relevance for the understanding of performance scores. This supports our previous quantitative analysis showing an interaction between cognitive abilities and interactivity. Based on our findings we reflect on the possibilities to inform CBLEs with relevant information regarding learners’ cognitive abilities and representational preferences
    • …
    corecore