4 research outputs found
Investigating Attention Modeling Differences between Older and Younger Drivers
As in-vehicle technologies (IVTs) grow in both popularity and complexity, the question of whether these IVTs improve, or hinder, driver performance has gained more attention. The ability to predict when a driver will be looking at the road or a display on the car’s dashboard or center console is crucial to understanding the impact of the recent tech-heavy trend in car designs on safety and the extent to which IVTs compete with the primary driving task for visual resources. The SEEV model of visual attention has been shown to be able to predict the probability of attending an area if interest (AOI) while driving based on the salience (SEEV-S) of visual stimuli, the effort (SEEV-Ef) required to shift attention between locations, the expectancy (SEEV-Ex) that information will be found at a specific location within the visual field, and the value (SEEV-V) of the information found at that location relative to the task(s) being performed. This study compared older and younger adult SEEV models calculated using eye tracking during a series of simulated driving scenarios with differing levels of effort, expectancy, and value placed on the primary driving task and a secondary in-vehicle task (IVT) to be done on the center console while maintaining lane position and speed. No significant effect of the effort variable was found, likely due to the cues used in our experiment not requiring head or torso rotation to access. Good model fits for both older and younger adults were found, with younger adults having greater weight on the dashboard AOI than older adults when the driving task was prioritized
Robust Validation of Visual Focus of Attention using Adaptive Fusion of Head and Eye Gaze patterns
We propose a framework for inferring the focus of attention of a person, utilizing information coming both from head rotation and eye gaze estimation. To this aim, we use fuzzy logic to estimate confidence on the gaze of a person towards a specific point, and results are compared to human annotation. For head pose we propose Bayesian modality fusion of both local and holistic information, while for eye gaze we propose a methodology that calculates eye gaze directionality, removing the influence of head rotation, using a simple camera. For local information, feature positions are used, while holistic information makes use of face region. Holistic information uses Convolutional Neural Networks which have been shown to be immune to small translations and distortions of test data. This is vital for an application in an unpretending environment, where background noise should be expected. The ability of the system to estimate focus of attention towards specific areas, for unknown users, is grounded at the end of the paper. 1
Recommended from our members
Multimodal Multisensor attention modelling
Introduction: Sustaining attention is one of the most important factors in determining successful outcomes and deep learning in students. Existing approaches to track student engagement involve periodic human observations that are subject to inter-rater reliability. Our solution uses real-time Multimodal Multisensor data labeled by objective performance outcomes to track the attention of students.
Method: The study involves four students with a combined diagnosis of cerebral palsy and a learning disability who took part in a 3-month trial over 59 sessions. Multimodal Multisensor data were collected while they participated in a Continuous Performance Test (CPT). Eyegaze, electroencephalogram, body pose, and interaction data were used to create a model of student attention through objective labeling from the Continuous Performance Test outcomes. To achieve this, a type of continuous performance test is introduced, the Seek-X type. Nine features were extracted including High-Level handpicked Compound Features (HLCF). Using leave-one-out cross-validation, a series of different machine learning approaches were evaluated.
Research questions:
RQ1: Can we create a model of attention for PMLD/CP students using the CPT?
RQ2: What are the main correlations found in the CPT outcomes and the Multimodal Multisensor data?
Results: Overall, the random forest classification approach achieved the best classification results. Using random forest, 84.8% classification for attention and 65.4% accuracy for inattention were achieved. We compared these results to outcomes from different models: AdaBoost, decision tree, k-Nearest Neighbor, naïve Bayes, neural network, and support vector machine. We showed that using a multisensor approach achieved higher accuracy than using features from any reduced set of sensors. Incorporating person-specific data improved the classification outcome, compared to being participant neutral. We found that using HighLevel handpicked Compound Features (HLCF) can improve the classification accuracy in every sensor mode. Our approach is robust to both sensor fallout and occlusions. The single most important sensor feature to the classification of attention and inattention was shown to be eye-gaze. We have shown that we can accurately predict the level of attention of students with learning disabilities in a real-time approach that is not subject to inter-rater reliability, human observation, or reliant on a single mode of sensor input. In total, 2475 separate correlation tests were carried over 55 data points using Pearson’s correlation coefficient. Data points from the SDT, CPT outcomes measures, Multimodal Multisensor features, and participant characteristics were assessed longitudinally for cross-correlation significance. A strong positive correlation was found between participant ability to maintain sustained and selective attention in the CPT to their academic progress in school (d′), P < .01. Participants who showed more inhibition in tests had progressed further in their academic assessments P < .01. The Seek-X type CPT also showed specific physiological characteristics, including body movement range and eye-gaze that were significant in P scales such as ‘Reading’ and ‘Listening’ P < .05. We found that participant bias was overall liberal B″D < 0. Participants iii showed no significant bias change during the sessions, and we found no significant correlation between bias (B″D) and sensitivity (d′).
Conclusion: An approach to labeling Multimodal Multisensor data to train machine-learning algorithms to track the attention of students with profound and multiple disabilities has been presented. We posit that this approach can overcome the variation in observer inter-rater reliability when using standardized scales in tracking the emotional expression of students with such profound disabilities. The accuracy of our approach increases with multiple modes of sensor input, and our method is robust to sensor occlusion and fall-out. Multiple sources of sensor input are provided, to accommodate a wide variety of users and their needs. Our model can reliably track the attention of students with profound disabilities, regardless of the sensors available. A system incorporating this model can help teachers design personalized interventions for a very heterogeneous group of students, where teachers cannot possibly attend to each of their individual needs. This approach could be used to identify those with the greatest learning challenges, to guarantee that all students are supported to reach their full potential.
Keywords—Affective computing in education, affect detection, attention, continuous performance test, engagement, flow, HCI, interaction, learning disabilities, machine learning, multimodal, multisensor, physiological sensors, Signal Detection Theory, selective attention, sustained attention, student engagement