Search CORE

1,139 research outputs found

Learning to Find Eye Region Landmarks for Remote Gaze Estimation in Unconstrained Settings

Author: Honari Sina
Ioffe Sergey
Krafka Kyle
Shrivastava Ashish
Tompson Jonathan
Xu Pingmei
Zafeiriou Stefanos
Zhang Xucong
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

Conventional feature-based and model-based gaze estimation methods have proven to perform well in settings with controlled illumination and specialized cameras. In unconstrained real-world settings, however, such methods are surpassed by recent appearance-based methods due to difficulties in modeling factors such as illumination changes and other visual artifacts. We present a novel learning-based method for eye region landmark localization that enables conventional methods to be competitive to latest appearance-based methods. Despite having been trained exclusively on synthetic data, our method exceeds the state of the art for iris localization and eye shape registration on real-world imagery. We then use the detected landmarks as input to iterative model-fitting and lightweight learning-based gaze estimation methods. Our approach outperforms existing model-fitting and appearance-based methods in the context of person-independent and personalized gaze estimation

arXiv.org e-Print Archive

Crossref

MPG.PuRe

Object Referring in Videos with Language and Human Gaze

Author: Dai Dengxin
Van Gool Luc
Vasudevan Arun Balajee
Publication venue
Publication date: 04/04/2018
Field of study

We investigate the problem of object referring (OR) i.e. to localize a target object in a visual scene coming with a language description. Humans perceive the world more as continued video snippets than as static images, and describe objects not only by their appearance, but also by their spatio-temporal context and motion features. Humans also gaze at the object when they issue a referring expression. Existing works for OR mostly focus on static images only, which fall short in providing many such cues. This paper addresses OR in videos with language and human gaze. To that end, we present a new video dataset for OR, with 30, 000 objects over 5, 000 stereo video sequences annotated for their descriptions and gaze. We further propose a novel network model for OR in videos, by integrating appearance, motion, gaze, and spatio-temporal context into one network. Experimental results show that our method effectively utilizes motion cues, human gaze, and spatio-temporal context. Our method outperforms previousOR methods. For dataset and code, please refer https://people.ee.ethz.ch/~arunv/ORGaze.html.Comment: Accepted to CVPR 2018, 10 pages, 6 figure

arXiv.org e-Print Archive

Repository for Publications and Research Data

Crossref

Recommended from our members

Mobile localization : approach and applications

Author: Rallapalli Swati
Publication venue
Publication date: 09/02/2015
Field of study

textLocalization is critical to a number of wireless network applications. In many situations GPS is not suitable. This dissertation (i) develops novel localization schemes for wireless networks by explicitly incorporating mobility information and (ii) applies localization to physical analytics i.e., understanding shoppers' behavior within retail spaces by leveraging inertial sensors, Wi-Fi and vision enabled by smart glasses. More specifically, we first focus on multi-hop mobile networks, analyze real mobility traces and observe that they exhibit temporal stability and low-rank structure. Motivated by these observations, we develop novel localization algorithms to effectively capture and also adapt to different degrees of these properties. Using extensive simulations and testbed experiments, we demonstrate the accuracy and robustness of our new schemes. Second, we focus on localizing a single mobile node, which may not be connected with multiple nodes (e.g., without network connectivity or only connected with an access point). We propose trajectory-based localization using Wi-Fi or magnetic field measurements. We show that these measurements have the potential to uniquely identify a trajectory. We then develop a novel approach that leverages multi-level wavelet coefficients to first identify the trajectory and then localize to a point on the trajectory. We show that this approach is highly accurate and power efficient using indoor and outdoor experiments. Finally, localization is a critical step in enabling a lot of applications --- an important one is physical analytics. Physical analytics has the potential to provide deep-insight into shoppers' interests and activities and therefore better advertisements, recommendations and a better shopping experience. To enable physical analytics, we build ThirdEye system which first achieves zero-effort localization by leveraging emergent devices like the Google-Glass to build AutoLayout that fuses video, Wi-Fi, and inertial sensor data, to simultaneously localize the shoppers while also constructing and updating the product layout in a virtual coordinate space. Further, ThirdEye comprises of a range of schemes that use a combination of vision and inertial sensing to study mobile users' behavior while shopping, namely: walking, dwelling, gazing and reaching-out. We show the effectiveness of ThirdEye through an evaluation in two large retail stores in the United States.Computer Science

Texas ScholarWorks

The Stare-In-The-Crowd Effect: Phenomenology, Psychophysiology, And Relations To Psychopathology

Author: Crehan Eileen Tara
Publication venue: UVM ScholarWorks
Publication date: 01/01/2016
Field of study

The eyes are a valuable source of information for a range of social processes. The stare-in-the-crowd effect describes the ability to detect self-directed gaze. Impairment in gaze detection mechanisms, such as the stare-in-the-crowd effect, has implications for social interactions and development of social relationships. Given the frequency with which humans utilize gaze detection in interactions, there is a need to better characterize the stare-in-the-crowd effect. This study utilized a previously validated dynamic visual paradigm to capture the stare-in-the-crowd effect. We compared typically-developing (TD) young adults and young adults with Autism Spectrum Disorder (ASD) on multiple measures of psychophysiology, including eye tracking and heart rate monitoring. Four conditions of visual stimuli were presented: averted gaze, mutual gaze, catching another staring, and getting caught staring. Eye tracking outcomes and arousal (pupil size and heart rate variability) were compared by diagnosis (TD or ASD) and condition (averted, mutual, catching another staring, getting caught staring) using repeated measure ANOVA. Significant interaction of diagnosis and condition was found for IA dwell time, IA fixation count, and IA second fixation duration. Hierarchical regression was used to assess how dimensional behavioral measures predicted eye tracking outcomes and arousal; only two models with advanced theory of mind as a predictor were significant. Overall, we demonstrated that individuals with ASD do respond differently to various gaze conditions in similar patterns to TD individuals, but to a lesser extent. This offers potential targets to social interventions to capitalize on this present but underdeveloped response to gaze. Implications and future directions are discussed

ScholarWorks @ UVM

Customer Gaze Estimation in Retail Using Deep Learning

Author: Jayarathna Sampath
Meedeniya Dulani
Pathirana Primesh
Senarath Shashimal
Publication venue: ODU Digital Commons
Publication date: 01/01/2022
Field of study

At present, intelligent computing applications are widely used in different domains, including retail stores. The analysis of customer behaviour has become crucial for the benefit of both customers and retailers. In this regard, the concept of remote gaze estimation using deep learning has shown promising results in analyzing customer behaviour in retail due to its scalability, robustness, low cost, and uninterrupted nature. This study presents a three-stage, three-attention-based deep convolutional neural network for remote gaze estimation in retail using image data. In the first stage, we design a mechanism to estimate the 3D gaze of the subject using image data and monocular depth estimation. The second stage presents a novel three-attention mechanism to estimate the gaze in the wild from field-of-view, depth range, and object channel attentions. The third stage generates the gaze saliency heatmap from the output attention map of the second stage. We train and evaluate the proposed model using benchmark GOO-Real dataset and compare results with baseline models. Further, we adapt our model to real-retail environments by introducing a novel Retail Gaze dataset. Extensive experiments demonstrate that our approach significantly improves remote gaze target estimation performance on GOO-Real and Retail Gaze datasets

Old Dominion University

The scene superiority effect: object recognition in the context of natural scenes

Author: Yao Richard
Publication venue
Publication date: 01/05/2011
Field of study

Four experiments investigate the effect of background scene semantics on object recognition. Although past research has found that semantically consistent scene backgrounds can facilitate recognition of a target object, these claims have been challenged as the result of post-perceptual response bias rather than the perceptual processes of object recognition itself. The current study takes advantage of a paradigm from linguistic processing known as the Word Superiority Effect. Humans can better discriminate letters (e.g., D vs. K) in the context of a word (WORD vs. WORK) than in a non-word context (e.g., WROD vs. WROK) even when the context is non-predictive of the target identity. We apply this paradigm to objects in natural scenes, having subjects discriminate between objects in the context of scenes. Because the target objects were equally semantically consistent with any given scene and could appear in either semantically consistent or inconsistent contexts with equal probability, response bias could not lead to an apparent improvement in object recognition. The current study found a benefit to object recognition from semantically consistent backgrounds, and the effect appeared to be modulated by awareness of background scene semantics

Illinois Digital Environment for Access to Learning and Scholarship Repository

Efficient human annotation schemes for training object class detectors

Author: Papadopoulos Dimitrios P.
Publication venue: The University of Edinburgh
Publication date: 02/07/2018
Field of study

A central task in computer vision is detecting object classes such as cars and horses in complex scenes. Training an object class detector typically requires a large set of images labeled with tight bounding boxes around every object instance. Obtaining such data requires human annotation, which is very expensive and time consuming. Alternatively, researchers have tried to train models in a weakly supervised setting (i.e., given only image-level labels), which is much cheaper but leads to weaker detectors. In this thesis, we propose new and efficient human annotation schemes for training object class detectors that bypass the need for drawing bounding boxes and reduce the annotation cost while still obtaining high quality object detectors. First, we propose to train object class detectors from eye tracking data. Instead of drawing tight bounding boxes, the annotators only need to look at the image and find the target object. We track the eye movements of annotators while they perform this visual search task and we propose a technique for deriving object bounding boxes from these eye fixations. To validate our idea, we augment an existing object detection dataset with eye tracking data. Second, we propose a scheme for training object class detectors, which only requires annotators to verify bounding-boxes produced automatically by the learning algorithm. Our scheme introduces human verification as a new step into a standard weakly supervised framework which typically iterates between re-training object detectors and re-localizing objects in the training images. We use the verification signal to improve both re-training and re-localization. Third, we propose another scheme where annotators are asked to click on the center of an imaginary bounding box, which tightly encloses the object. We then incorporate these clicks into a weakly supervised object localization technique, to jointly localize object bounding boxes over all training images. Both our center-clicking and human verification schemes deliver detectors performing almost as well as those trained in a fully supervised setting. Finally, we propose extreme clicking. We ask the annotator to click on four physical points on the object: the top, bottom, left- and right-most points. This task is more natural than the traditional way of drawing boxes and these points are easy to find. Our experiments show that annotating objects with extreme clicking is 5 X faster than the traditional way of drawing boxes and it leads to boxes of the same quality as the original ground-truth drawn the traditional way. Moreover, we use the resulting extreme points to obtain more accurate segmentations than those derived from bounding boxes

Edinburgh Research Archive