4,644 research outputs found
Object Referring in Videos with Language and Human Gaze
We investigate the problem of object referring (OR) i.e. to localize a target
object in a visual scene coming with a language description. Humans perceive
the world more as continued video snippets than as static images, and describe
objects not only by their appearance, but also by their spatio-temporal context
and motion features. Humans also gaze at the object when they issue a referring
expression. Existing works for OR mostly focus on static images only, which
fall short in providing many such cues. This paper addresses OR in videos with
language and human gaze. To that end, we present a new video dataset for OR,
with 30, 000 objects over 5, 000 stereo video sequences annotated for their
descriptions and gaze. We further propose a novel network model for OR in
videos, by integrating appearance, motion, gaze, and spatio-temporal context
into one network. Experimental results show that our method effectively
utilizes motion cues, human gaze, and spatio-temporal context. Our method
outperforms previousOR methods. For dataset and code, please refer
https://people.ee.ethz.ch/~arunv/ORGaze.html.Comment: Accepted to CVPR 2018, 10 pages, 6 figure
Detection of moving point symbols on cartographic backgrounds
The present paper presents the performance of an experimental cartographic study towards the examination of the minimum duration threshold required for the detection by the central vision of a moving point symbol on cartographic backgrounds. The examined threshold is investigated using backgrounds with discriminant levels of information. The experimental process is based on the collection (under free viewing conditions) and the analysis of eye movement recordings. The computation of fixation derived statistical metrics allows the calculation of the examined threshold as well as the study of the general visual reaction of map users. The critical duration threshold calculated within the present study corresponds to a time span around 400msec. The results of the analysis indicate meaningful evidences about these issues while the suggested approach can be applied towards the examination of perception thresholds related to changes occurred on dynamic stimuli
WHERE DO YOU LOOK? RELATING VISUAL ATTENTION TO LEARNING OUTCOMES AND URL PARSING
Visual behavior provides a dynamic trail of where attention is directed. It is considered the behavioral interface between engagement and gaining information, and researchers have used it for several decades to study user\u27s behavior. This thesis focuses on employing visual attention to understand user\u27s behavior in two contexts: 3D learning and gauging URL safety. Such understanding is valuable for improving interactive tools and interface designs. In the first chapter, we present results from studying learners\u27 visual behavior while engaging with tangible and virtual 3D representations of objects. This is a replication of a recent study, and we extended it using eye tracking. By analyzing the visual behavior, we confirmed the original study results and added more quantitative explanations for the corresponding learning outcomes. Among other things, our results indicated that the users allocate similar visual attention while analyzing virtual and tangible learning material. In the next chapter, we present a user study\u27s outcomes wherein participants are instructed to classify a set of URLs wearing an eye tracker. Much effort is spent on teaching users how to detect malicious URLs. There has been significantly less focus on understanding exactly how and why users routinely fail to vet URLs properly. This user study aims to fill the void by shedding light on the underlying processes that users employ to gauge the UR L\u27s trustworthiness at the time of scanning. Our findings suggest that users have a cap on the amount of cognitive resources they are willing to expend on vetting a URL. Also, they tend to believe that the presence of www in the domain name indicates that the URL is safe
- …