3 research outputs found
Low-cost Geometry-based Eye Gaze Detection using Facial Landmarks Generated through Deep Learning
Introduction: In the realm of human-computer interaction and behavioral
research, accurate real-time gaze estimation is critical. Traditional methods
often rely on expensive equipment or large datasets, which are impractical in
many scenarios. This paper introduces a novel, geometry-based approach to
address these challenges, utilizing consumer-grade hardware for broader
applicability. Methods: We leverage novel face landmark detection neural
networks capable of fast inference on consumer-grade chips to generate accurate
and stable 3D landmarks of the face and iris. From these, we derive a small set
of geometry-based descriptors, forming an 8-dimensional manifold representing
the eye and head movements. These descriptors are then used to formulate linear
equations for predicting eye-gaze direction. Results: Our approach demonstrates
the ability to predict gaze with an angular error of less than 1.9 degrees,
rivaling state-of-the-art systems while operating in real-time and requiring
negligible computational resources. Conclusion: The developed method marks a
significant step forward in gaze estimation technology, offering a highly
accurate, efficient, and accessible alternative to traditional systems. It
opens up new possibilities for real-time applications in diverse fields, from
gaming to psychological research
Using Variable Dwell Time to Accelerate Gaze-Based Web Browsing with Two-Step Selection
In order to avoid the "Midas Touch" problem, gaze-based interfaces for
selection often introduce a dwell time: a fixed amount of time the user must
fixate upon an object before it is selected. Past interfaces have used a
uniform dwell time across all objects. Here, we propose a gaze-based browser
using a two-step selection policy with variable dwell time. In the first step,
a command, e.g. "back" or "select", is chosen from a menu using a dwell time
that is constant across the different commands. In the second step, if the
"select" command is chosen, the user selects a hyperlink using a dwell time
that varies between different hyperlinks. We assign shorter dwell times to more
likely hyperlinks and longer dwell times to less likely hyperlinks. In order to
infer the likelihood each hyperlink will be selected, we have developed a
probabilistic model of natural gaze behavior while surfing the web. We have
evaluated a number of heuristic and probabilistic methods for varying the dwell
times using both simulation and experiment. Our results demonstrate that
varying dwell time improves the user experience in comparison with fixed dwell
time, resulting in fewer errors and increased speed. While all of the methods
for varying dwell time resulted in improved performance, the probabilistic
models yielded much greater gains than the simple heuristics. The best
performing model reduces error rate by 50% compared to 100ms uniform dwell time
while maintaining a similar response time. It reduces response time by 60%
compared to 300ms uniform dwell time while maintaining a similar error rate.Comment: This is an Accepted Manuscript of an article published by Taylor &
Francis in the International Journal of Human-Computer Interaction on 30
March, 2018, available online:
http://www.tandfonline.com/10.1080/10447318.2018.1452351 . For an eprint of
the final published article, please access:
https://www.tandfonline.com/eprint/T9d4cNwwRUqXPPiZYm8Z/ful
Temporal-frequency-phase feature classification using 3D-convolutional neural networks for motor imagery and movement
Recently, convolutional neural networks (CNNs) have been widely applied in brain-computer interface (BCI) based on electroencephalogram (EEG) signals. Due to the subject-specific nature of EEG signal patterns and the multi-dimensionality of EEG features, it is necessary to employ appropriate feature representation methods to enhance the decoding accuracy of EEG. In this study, we proposed a method for representing EEG temporal, frequency, and phase features, aiming to preserve the multi-domain information of EEG signals. Specifically, we generated EEG temporal segments using a sliding window strategy. Then, temporal, frequency, and phase features were extracted from different temporal segments and stacked into 3D feature maps, namely temporal-frequency-phase features (TFPF). Furthermore, we designed a compact 3D-CNN model to extract these multi-domain features efficiently. Considering the inter-individual variability in EEG data, we conducted individual testing for each subject. The proposed model achieved an average accuracy of 89.86, 78.85, and 63.55% for 2-class, 3-class, and 4-class motor imagery (MI) classification tasks, respectively, on the PhysioNet dataset. On the GigaDB dataset, the average accuracy for 2-class MI classification was 91.91%. For the comparison between MI and real movement (ME) tasks, the average accuracy for the 2-class were 87.66 and 80.13% on the PhysioNet and GigaDB datasets, respectively. Overall, the method presented in this paper have obtained good results in MI/ME tasks and have a good application prospect in the development of BCI systems based on MI/ME