17 research outputs found
A Differential Approach for Gaze Estimation
Non-invasive gaze estimation methods usually regress gaze directions directly
from a single face or eye image. However, due to important variabilities in eye
shapes and inner eye structures amongst individuals, universal models obtain
limited accuracies and their output usually exhibit high variance as well as
biases which are subject dependent. Therefore, increasing accuracy is usually
done through calibration, allowing gaze predictions for a subject to be mapped
to his/her actual gaze. In this paper, we introduce a novel image differential
method for gaze estimation. We propose to directly train a differential
convolutional neural network to predict the gaze differences between two eye
input images of the same subject. Then, given a set of subject specific
calibration images, we can use the inferred differences to predict the gaze
direction of a novel eye sample. The assumption is that by allowing the
comparison between two eye images, annoyance factors (alignment, eyelid
closing, illumination perturbations) which usually plague single image
prediction methods can be much reduced, allowing better prediction altogether.
Experiments on 3 public datasets validate our approach which constantly
outperforms state-of-the-art methods even when using only one calibration
sample or when the latter methods are followed by subject specific gaze
adaptation.Comment: Extension to our paper A differential approach for gaze estimation
with calibration (BMVC 2018) Submitted to PAMI on Aug. 7th, 2018 Accepted by
PAMI short on Dec. 2019, in IEEE Transactions on Pattern Analysis and Machine
Intelligenc
Automated Deception Detection from Videos: Using End-to-End Learning Based High-Level Features and Classification Approaches
Deception detection is an interdisciplinary field attracting researchers from
psychology, criminology, computer science, and economics. We propose a
multimodal approach combining deep learning and discriminative models for
automated deception detection. Using video modalities, we employ convolutional
end-to-end learning to analyze gaze, head pose, and facial expressions,
achieving promising results compared to state-of-the-art methods. Due to
limited training data, we also utilize discriminative models for deception
detection. Although sequence-to-class approaches are explored, discriminative
models outperform them due to data scarcity. Our approach is evaluated on five
datasets, including a new Rolling-Dice Experiment motivated by economic
factors. Results indicate that facial expressions outperform gaze and head
pose, and combining modalities with feature selection enhances detection
performance. Differences in expressed features across datasets emphasize the
importance of scenario-specific training data and the influence of context on
deceptive behavior. Cross-dataset experiments reinforce these findings. Despite
the challenges posed by low-stake datasets, including the Rolling-Dice
Experiment, deception detection performance exceeds chance levels. Our proposed
multimodal approach and comprehensive evaluation shed light on the potential of
automating deception detection from video modalities, opening avenues for
future research.Comment: 29 pages, 17 figures (19 if counting subfigures
Federated learning in gaze recognition (FLIGR)
The efficiency and generalizability of a deep learning model is based on the amount and diversity of training data. Although huge amounts of data are being collected, these data are not stored in centralized servers for further data processing. It is often infeasible to collect and share data in centralized servers due to various medical data regulations. This need for diversely distributed data and infeasible storage solutions calls for Federated Learning (FL). FL is a clever way of utilizing privately stored data in model building without the need for data sharing. The idea is to train several different models locally with same architecture, share the model weights between the collaborators, aggregate the model weights and use the resulting global weights in furthering model building. FL is an iterative algorithm which repeats the above steps over defined number of rounds. By doing so, we negate the need for centralized data sharing and avoid several regulations tied to it. In this work, federated learning is applied to gaze recognition, a task to identify where the doctor’s gaze at. A global model is built by repeatedly aggregating local models built from 8 local institutional data using the FL algorithm for 4 federated rounds. The results show increase in the performance of the global model over federated rounds. The study also shows that the global model can be trained one more time locally at the end of FL on each institutional level to fine-tune the model to local data
A Coarse-to-Fine Adaptive Network for Appearance-Based Gaze Estimation
Human gaze is essential for various appealing applications. Aiming at more
accurate gaze estimation, a series of recent works propose to utilize face and
eye images simultaneously. Nevertheless, face and eye images only serve as
independent or parallel feature sources in those works, the intrinsic
correlation between their features is overlooked. In this paper we make the
following contributions: 1) We propose a coarse-to-fine strategy which
estimates a basic gaze direction from face image and refines it with
corresponding residual predicted from eye images. 2) Guided by the proposed
strategy, we design a framework which introduces a bi-gram model to bridge gaze
residual and basic gaze direction, and an attention component to adaptively
acquire suitable fine-grained feature. 3) Integrating the above innovations, we
construct a coarse-to-fine adaptive network named CA-Net and achieve
state-of-the-art performances on MPIIGaze and EyeDiap.Comment: 9 pages, 7figures, AAAI-2