3 research outputs found

    Low-cost Geometry-based Eye Gaze Detection using Facial Landmarks Generated through Deep Learning

    Full text link
    Introduction: In the realm of human-computer interaction and behavioral research, accurate real-time gaze estimation is critical. Traditional methods often rely on expensive equipment or large datasets, which are impractical in many scenarios. This paper introduces a novel, geometry-based approach to address these challenges, utilizing consumer-grade hardware for broader applicability. Methods: We leverage novel face landmark detection neural networks capable of fast inference on consumer-grade chips to generate accurate and stable 3D landmarks of the face and iris. From these, we derive a small set of geometry-based descriptors, forming an 8-dimensional manifold representing the eye and head movements. These descriptors are then used to formulate linear equations for predicting eye-gaze direction. Results: Our approach demonstrates the ability to predict gaze with an angular error of less than 1.9 degrees, rivaling state-of-the-art systems while operating in real-time and requiring negligible computational resources. Conclusion: The developed method marks a significant step forward in gaze estimation technology, offering a highly accurate, efficient, and accessible alternative to traditional systems. It opens up new possibilities for real-time applications in diverse fields, from gaming to psychological research

    Using Variable Dwell Time to Accelerate Gaze-Based Web Browsing with Two-Step Selection

    Full text link
    In order to avoid the "Midas Touch" problem, gaze-based interfaces for selection often introduce a dwell time: a fixed amount of time the user must fixate upon an object before it is selected. Past interfaces have used a uniform dwell time across all objects. Here, we propose a gaze-based browser using a two-step selection policy with variable dwell time. In the first step, a command, e.g. "back" or "select", is chosen from a menu using a dwell time that is constant across the different commands. In the second step, if the "select" command is chosen, the user selects a hyperlink using a dwell time that varies between different hyperlinks. We assign shorter dwell times to more likely hyperlinks and longer dwell times to less likely hyperlinks. In order to infer the likelihood each hyperlink will be selected, we have developed a probabilistic model of natural gaze behavior while surfing the web. We have evaluated a number of heuristic and probabilistic methods for varying the dwell times using both simulation and experiment. Our results demonstrate that varying dwell time improves the user experience in comparison with fixed dwell time, resulting in fewer errors and increased speed. While all of the methods for varying dwell time resulted in improved performance, the probabilistic models yielded much greater gains than the simple heuristics. The best performing model reduces error rate by 50% compared to 100ms uniform dwell time while maintaining a similar response time. It reduces response time by 60% compared to 300ms uniform dwell time while maintaining a similar error rate.Comment: This is an Accepted Manuscript of an article published by Taylor & Francis in the International Journal of Human-Computer Interaction on 30 March, 2018, available online: http://www.tandfonline.com/10.1080/10447318.2018.1452351 . For an eprint of the final published article, please access: https://www.tandfonline.com/eprint/T9d4cNwwRUqXPPiZYm8Z/ful

    Temporal-frequency-phase feature classification using 3D-convolutional neural networks for motor imagery and movement

    Get PDF
    Recently, convolutional neural networks (CNNs) have been widely applied in brain-computer interface (BCI) based on electroencephalogram (EEG) signals. Due to the subject-specific nature of EEG signal patterns and the multi-dimensionality of EEG features, it is necessary to employ appropriate feature representation methods to enhance the decoding accuracy of EEG. In this study, we proposed a method for representing EEG temporal, frequency, and phase features, aiming to preserve the multi-domain information of EEG signals. Specifically, we generated EEG temporal segments using a sliding window strategy. Then, temporal, frequency, and phase features were extracted from different temporal segments and stacked into 3D feature maps, namely temporal-frequency-phase features (TFPF). Furthermore, we designed a compact 3D-CNN model to extract these multi-domain features efficiently. Considering the inter-individual variability in EEG data, we conducted individual testing for each subject. The proposed model achieved an average accuracy of 89.86, 78.85, and 63.55% for 2-class, 3-class, and 4-class motor imagery (MI) classification tasks, respectively, on the PhysioNet dataset. On the GigaDB dataset, the average accuracy for 2-class MI classification was 91.91%. For the comparison between MI and real movement (ME) tasks, the average accuracy for the 2-class were 87.66 and 80.13% on the PhysioNet and GigaDB datasets, respectively. Overall, the method presented in this paper have obtained good results in MI/ME tasks and have a good application prospect in the development of BCI systems based on MI/ME
    corecore