672 research outputs found
Appearance-Based Gaze Estimation in the Wild
Appearance-based gaze estimation is believed to work well in real-world
settings, but existing datasets have been collected under controlled laboratory
conditions and methods have been not evaluated across multiple datasets. In
this work we study appearance-based gaze estimation in the wild. We present the
MPIIGaze dataset that contains 213,659 images we collected from 15 participants
during natural everyday laptop use over more than three months. Our dataset is
significantly more variable than existing ones with respect to appearance and
illumination. We also present a method for in-the-wild appearance-based gaze
estimation using multimodal convolutional neural networks that significantly
outperforms state-of-the art methods in the most challenging cross-dataset
evaluation. We present an extensive evaluation of several state-of-the-art
image-based gaze estimation algorithms on three current datasets, including our
own. This evaluation provides clear insights and allows us to identify key
research challenges of gaze estimation in the wild
Probabilistic Approach to Robust Wearable Gaze Tracking
Creative Commons Attribution License (CC BY 4.0)This paper presents a method for computing the gaze point using camera data captured with a wearable gaze tracking device. The method utilizes a physical model of the human eye, ad- vanced Bayesian computer vision algorithms, and Kalman filtering, resulting in high accuracy and low noise. Our C++ implementation can process camera streams with 30 frames per second in realtime. The performance of the system is validated in an exhaustive experimental setup with 19 participants, using a self-made device. Due to the used eye model and binocular cam- eras, the system is accurate for all distances and invariant to device movement. We also test our system against a best-in-class commercial device which is outperformed for spatial accuracy and precision. The software and hardware instructions as well as the experimental data are pub- lished as open source.Peer reviewe
Multimodal Observation and Interpretation of Subjects Engaged in Problem Solving
In this paper we present the first results of a pilot experiment in the
capture and interpretation of multimodal signals of human experts engaged in
solving challenging chess problems. Our goal is to investigate the extent to
which observations of eye-gaze, posture, emotion and other physiological
signals can be used to model the cognitive state of subjects, and to explore
the integration of multiple sensor modalities to improve the reliability of
detection of human displays of awareness and emotion. We observed chess players
engaged in problems of increasing difficulty while recording their behavior.
Such recordings can be used to estimate a participant's awareness of the
current situation and to predict ability to respond effectively to challenging
situations. Results show that a multimodal approach is more accurate than a
unimodal one. By combining body posture, visual attention and emotion, the
multimodal approach can reach up to 93% of accuracy when determining player's
chess expertise while unimodal approach reaches 86%. Finally this experiment
validates the use of our equipment as a general and reproducible tool for the
study of participants engaged in screen-based interaction and/or problem
solving
FaceVR: Real-Time Facial Reenactment and Eye Gaze Control in Virtual Reality
We introduce FaceVR, a novel method for gaze-aware facial reenactment in the Virtual Reality (VR) context. The key component of FaceVR is a robust algorithm to perform real-time facial motion capture of an actor who is wearing a head-mounted display (HMD), as well as a new data-driven approach for eye tracking from monocular videos. In addition to these face reconstruction components, FaceVR incorporates photo-realistic re-rendering in real time, thus allowing artificial modifications of face and eye appearances. For instance, we can alter facial expressions, change gaze directions, or remove the VR goggles in realistic re-renderings. In a live setup with a source and a target actor, we apply these newly-introduced algorithmic components. We assume that the source actor is wearing a VR device, and we capture his facial expressions and eye movement in real-time. For the target video, we mimic a similar tracking process; however, we use the source input to drive the animations of the target video, thus enabling gaze-aware facial reenactment. To render the modified target video on a stereo display, we augment our capture and reconstruction process with stereo data. In the end, FaceVR produces compelling results for a variety of applications, such as gaze-aware facial reenactment, reenactment in virtual reality, removal of VR goggles, and re-targeting of somebody's gaze direction in a video conferencing call
Unobtrusive and pervasive video-based eye-gaze tracking
Eye-gaze tracking has long been considered a desktop technology that finds its use inside the traditional office setting, where the operating conditions may be controlled. Nonetheless, recent advancements in mobile technology and a growing interest in capturing natural human behaviour have motivated an emerging interest in tracking eye movements within unconstrained real-life conditions, referred to as pervasive eye-gaze tracking. This critical review focuses on emerging passive and unobtrusive video-based eye-gaze tracking methods in recent literature, with the aim to identify different research avenues that are being followed in response to the challenges of pervasive eye-gaze tracking. Different eye-gaze tracking approaches are discussed in order to bring out their strengths and weaknesses, and to identify any limitations, within the context of pervasive eye-gaze tracking, that have yet to be considered by the computer vision community.peer-reviewe
A Virtual Testbed for Fish-Tank Virtual Reality: Improving Calibration with a Virtual-in-Virtual Display
With the development of novel calibration techniques for multimedia projectors and curved projection surfaces, volumetric 3D displays are becoming easier and more affordable to build. The basic requirements include a display shape that defines the volume (e.g. a sphere, cylinder, or cuboid) and a tracking system to provide each user's location for the perspective corrected rendering. When coupled with modern graphics cards, these displays are capable of high resolution, low latency, high frame rate, and even stereoscopic rendering; however, like many previous studies have shown, every component must be precisely calibrated for a compelling 3D effect. While human perceptual requirements have been extensively studied for head-tracked displays, most studies featured seated users in front of a flat display. It remains unclear if results from these flat display studies are applicable to newer, walk-around displays with enclosed or curved shapes. To investigate these issues, we developed a virtual testbed for volumetric head-tracked displays that can measure calibration accuracy of the entire system in real-time. We used this testbed to investigate visual distortions of prototype curved displays, improve existing calibration techniques, study the importance of stereo to performance and perception, and validate perceptual calibration with novice users. Our experiments show that stereo is important for task performance, but requires more accurate calibration, and that novice users can make effective use of perceptual calibration tools. We also propose a novel, real-time calibration method that can be used to fine-tune an existing calibration using perceptual feedback. The findings from this work can be used to build better head-tracked volumetric displays with an unprecedented amount of 3D realism and intuitive calibration tools for novice users
Gaze-contingent perceptually enabled interactions in the operating theatre.
PURPOSE: Improved surgical outcome and patient safety in the operating theatre are constant challenges. We hypothesise that a framework that collects and utilises information -especially perceptually enabled ones-from multiple sources, could help to meet the above goals. This paper presents some core functionalities of a wider low-cost framework under development that allows perceptually enabled interaction within the surgical environment. METHODS: The synergy of wearable eye-tracking and advanced computer vision methodologies, such as SLAM, is exploited. As a demonstration of one of the framework's possible functionalities, an articulated collaborative robotic arm and laser pointer is integrated and the set-up is used to project the surgeon's fixation point in 3D space. RESULTS: The implementation is evaluated over 60 fixations on predefined targets, with distances between the subject and the targets of 92-212Â cm and between the robot and the targets of 42-193Â cm. The median overall system error is currently 3.98Â cm. Its real-time potential is also highlighted. CONCLUSIONS: The work presented here represents an introduction and preliminary experimental validation of core functionalities of a larger framework under development. The proposed framework is geared towards a safer and more efficient surgical theatre
AFFECT-PRESERVING VISUAL PRIVACY PROTECTION
The prevalence of wireless networks and the convenience of mobile cameras enable many new video applications other than security and entertainment. From behavioral diagnosis to wellness monitoring, cameras are increasing used for observations in various educational and medical settings. Videos collected for such applications are considered protected health information under privacy laws in many countries. Visual privacy protection techniques, such as blurring or object removal, can be used to mitigate privacy concern, but they also obliterate important visual cues of affect and social behaviors that are crucial for the target applications. In this dissertation, we propose to balance the privacy protection and the utility of the data by preserving the privacy-insensitive information, such as pose and expression, which is useful in many applications involving visual understanding.
The Intellectual Merits of the dissertation include a novel framework for visual privacy protection by manipulating facial image and body shape of individuals, which: (1) is able to conceal the identity of individuals; (2) provide a way to preserve the utility of the data, such as expression and pose information; (3) balance the utility of the data and capacity of the privacy protection.
The Broader Impacts of the dissertation focus on the significance of privacy protection on visual data, and the inadequacy of current privacy enhancing technologies in preserving affect and behavioral attributes of the visual content, which are highly useful for behavior observation in educational and medical settings. This work in this dissertation represents one of the first attempts in achieving both goals simultaneously
EgoHumans: An Egocentric 3D Multi-Human Benchmark
We present EgoHumans, a new multi-view multi-human video benchmark to advance
the state-of-the-art of egocentric human 3D pose estimation and tracking.
Existing egocentric benchmarks either capture single subject or indoor-only
scenarios, which limit the generalization of computer vision algorithms for
real-world applications. We propose a novel 3D capture setup to construct a
comprehensive egocentric multi-human benchmark in the wild with annotations to
support diverse tasks such as human detection, tracking, 2D/3D pose estimation,
and mesh recovery. We leverage consumer-grade wearable camera-equipped glasses
for the egocentric view, which enables us to capture dynamic activities like
playing tennis, fencing, volleyball, etc. Furthermore, our multi-view setup
generates accurate 3D ground truth even under severe or complete occlusion. The
dataset consists of more than 125k egocentric images, spanning diverse scenes
with a particular focus on challenging and unchoreographed multi-human
activities and fast-moving egocentric views. We rigorously evaluate existing
state-of-the-art methods and highlight their limitations in the egocentric
scenario, specifically on multi-human tracking. To address such limitations, we
propose EgoFormer, a novel approach with a multi-stream transformer
architecture and explicit 3D spatial reasoning to estimate and track the human
pose. EgoFormer significantly outperforms prior art by 13.6% IDF1 on the
EgoHumans dataset.Comment: Accepted to ICCV 2023 (Oral
- …