43,419 research outputs found

    The Role of Multiple Articulatory Channels of Sign-Supported Speech Revealed by Visual Processing

    Get PDF
    Purpose The use of sign-supported speech (SSS) in the education of deaf students has been recently discussed in relation to its usefulness with deaf children using cochlear implants. To clarify the benefits of SSS for comprehension, 2 eye-tracking experiments aimed to detect the extent to which signs are actively processed in this mode of communication. Method Participants were 36 deaf adolescents, including cochlear implant users and native deaf signers. Experiment 1 attempted to shift observers' foveal attention to the linguistic source in SSS from which most information is extracted, lip movements or signs, by magnifying the face area, thus modifying lip movements perceptual accessibility (magnified condition), and by constraining the visual field to either the face or the sign through a moving window paradigm (gaze contingent condition). Experiment 2 aimed to explore the reliance on signs in SSS by occasionally producing a mismatch between sign and speech. Participants were required to concentrate upon the orally transmitted message. Results In Experiment 1, analyses revealed a greater number of fixations toward the signs and a reduction in accuracy in the gaze contingent condition across all participants. Fixations toward signs were also increased in the magnified condition. In Experiment 2, results indicated less accuracy in the mismatching condition across all participants. Participants looked more at the sign when it was inconsistent with speech. Conclusions All participants, even those with residual hearing, rely on signs when attending SSS, either peripherally or through overt attention, depending on the perceptual conditions.Unión Europea, Grant Agreement 31674

    Robust Modeling of Epistemic Mental States

    Full text link
    This work identifies and advances some research challenges in the analysis of facial features and their temporal dynamics with epistemic mental states in dyadic conversations. Epistemic states are: Agreement, Concentration, Thoughtful, Certain, and Interest. In this paper, we perform a number of statistical analyses and simulations to identify the relationship between facial features and epistemic states. Non-linear relations are found to be more prevalent, while temporal features derived from original facial features have demonstrated a strong correlation with intensity changes. Then, we propose a novel prediction framework that takes facial features and their nonlinear relation scores as input and predict different epistemic states in videos. The prediction of epistemic states is boosted when the classification of emotion changing regions such as rising, falling, or steady-state are incorporated with the temporal features. The proposed predictive models can predict the epistemic states with significantly improved accuracy: correlation coefficient (CoERR) for Agreement is 0.827, for Concentration 0.901, for Thoughtful 0.794, for Certain 0.854, and for Interest 0.913.Comment: Accepted for Publication in Multimedia Tools and Application, Special Issue: Socio-Affective Technologie

    Digging Deeper into Egocentric Gaze Prediction

    Full text link
    This paper digs deeper into factors that influence egocentric gaze. Instead of training deep models for this purpose in a blind manner, we propose to inspect factors that contribute to gaze guidance during daily tasks. Bottom-up saliency and optical flow are assessed versus strong spatial prior baselines. Task-specific cues such as vanishing point, manipulation point, and hand regions are analyzed as representatives of top-down information. We also look into the contribution of these factors by investigating a simple recurrent neural model for ego-centric gaze prediction. First, deep features are extracted for all input video frames. Then, a gated recurrent unit is employed to integrate information over time and to predict the next fixation. We also propose an integrated model that combines the recurrent model with several top-down and bottom-up cues. Extensive experiments over multiple datasets reveal that (1) spatial biases are strong in egocentric videos, (2) bottom-up saliency models perform poorly in predicting gaze and underperform spatial biases, (3) deep features perform better compared to traditional features, (4) as opposed to hand regions, the manipulation point is a strong influential cue for gaze prediction, (5) combining the proposed recurrent model with bottom-up cues, vanishing points and, in particular, manipulation point results in the best gaze prediction accuracy over egocentric videos, (6) the knowledge transfer works best for cases where the tasks or sequences are similar, and (7) task and activity recognition can benefit from gaze prediction. Our findings suggest that (1) there should be more emphasis on hand-object interaction and (2) the egocentric vision community should consider larger datasets including diverse stimuli and more subjects.Comment: presented at WACV 201
    • …
    corecore