5,367 research outputs found
Multimodal Observation and Interpretation of Subjects Engaged in Problem Solving
In this paper we present the first results of a pilot experiment in the
capture and interpretation of multimodal signals of human experts engaged in
solving challenging chess problems. Our goal is to investigate the extent to
which observations of eye-gaze, posture, emotion and other physiological
signals can be used to model the cognitive state of subjects, and to explore
the integration of multiple sensor modalities to improve the reliability of
detection of human displays of awareness and emotion. We observed chess players
engaged in problems of increasing difficulty while recording their behavior.
Such recordings can be used to estimate a participant's awareness of the
current situation and to predict ability to respond effectively to challenging
situations. Results show that a multimodal approach is more accurate than a
unimodal one. By combining body posture, visual attention and emotion, the
multimodal approach can reach up to 93% of accuracy when determining player's
chess expertise while unimodal approach reaches 86%. Finally this experiment
validates the use of our equipment as a general and reproducible tool for the
study of participants engaged in screen-based interaction and/or problem
solving
Recommended from our members
Information acquisition using eye-gaze tracking for person-following with mobile robots
In the effort of developing natural means for human-robot interaction (HRI), signifcant amount of research has been focusing on Person-Following (PF) for mobile robots. PF, which generally consists of detecting, recognizing and following people, is believed to be one of the required functionalities for most future robots that share their environments with their human companions. Research in this field is mostly directed towards fully automating this functionality, which makes the challenge even more tedious. Focusing on this challenge leads research to divert from other challenges that coexist in any PF system. A natural PF functionality consists of a number of tasks that are required to be implemented in the system. However, in more realistic life scenarios, not all the tasks required for PF need to be automated. Instead, some of these tasks can be operated by human operators and therefore require natural means of interaction and information acquisition. In order to highlight all the tasks that are believed to exist in any PF system, this paper introduces a novel taxonomy for PF. Also, in order to provide a natural means for HRI, TeleGaze is used for information acquisition in the implementation of the taxonomy. TeleGaze was previously developed by the authors as a means of natural HRI for teleoperation through eye-gaze tracking. Using TeleGaze in the aid of developing PF systems is believed to show the feasibility of achieving a realistic information acquisition in a natural way
Tracking Gaze and Visual Focus of Attention of People Involved in Social Interaction
The visual focus of attention (VFOA) has been recognized as a prominent
conversational cue. We are interested in estimating and tracking the VFOAs
associated with multi-party social interactions. We note that in this type of
situations the participants either look at each other or at an object of
interest; therefore their eyes are not always visible. Consequently both gaze
and VFOA estimation cannot be based on eye detection and tracking. We propose a
method that exploits the correlation between eye gaze and head movements. Both
VFOA and gaze are modeled as latent variables in a Bayesian switching
state-space model. The proposed formulation leads to a tractable learning
procedure and to an efficient algorithm that simultaneously tracks gaze and
visual focus. The method is tested and benchmarked using two publicly available
datasets that contain typical multi-party human-robot and human-human
interactions.Comment: 15 pages, 8 figures, 6 table
Multimodal Polynomial Fusion for Detecting Driver Distraction
Distracted driving is deadly, claiming 3,477 lives in the U.S. in 2015 alone.
Although there has been a considerable amount of research on modeling the
distracted behavior of drivers under various conditions, accurate automatic
detection using multiple modalities and especially the contribution of using
the speech modality to improve accuracy has received little attention. This
paper introduces a new multimodal dataset for distracted driving behavior and
discusses automatic distraction detection using features from three modalities:
facial expression, speech and car signals. Detailed multimodal feature analysis
shows that adding more modalities monotonically increases the predictive
accuracy of the model. Finally, a simple and effective multimodal fusion
technique using a polynomial fusion layer shows superior distraction detection
results compared to the baseline SVM and neural network models.Comment: INTERSPEECH 201
- …