Search CORE

26,612 research outputs found

How visual cues to speech rate influence speech perception

Author: Bosker H.
Holler J.
Peeters D.
Publication venue: 'SAGE Publications'
Publication date: 01/01/2020
Field of study

Spoken words are highly variable and therefore listeners interpret speech sounds relative to the surrounding acoustic context, such as the speech rate of a preceding sentence. For instance, a vowel midway between short /ɑ/ and long /a:/ in Dutch is perceived as short /ɑ/ in the context of preceding slow speech, but as long /a:/ if preceded by a fast context. Despite the well-established influence of visual articulatory cues on speech comprehension, it remains unclear whether visual cues to speech rate also influence subsequent spoken word recognition. In two ‘Go Fish’-like experiments, participants were presented with audio-only (auditory speech + fixation cross), visual-only (mute videos of talking head), and audiovisual (speech + videos) context sentences, followed by ambiguous target words containing vowels midway between short /ɑ/ and long /a:/. In Experiment 1, target words were always presented auditorily, without visual articulatory cues. Although the audio-only and audiovisual contexts induced a rate effect (i.e., more long /a:/ responses after fast contexts), the visual-only condition did not. When, in Experiment 2, target words were presented audiovisually, rate effects were observed in all three conditions, including visual-only. This suggests that visual cues to speech rate in a context sentence influence the perception of following visual target cues (e.g., duration of lip aperture), which at an audiovisual integration stage bias participants’ target categorization responses. These findings contribute to a better understanding of how what we see influences what we hear

MPG.PuRe

Tilburg University Repository

Explorations in engagement for humans and robots

Author: Kidd Cory
Lee Christopher
Lesh Neal
Rich Charles
Sidner Candace L.
Publication venue
Publication date: 01/01/2005
Field of study

This paper explores the concept of engagement, the process by which individuals in an interaction start, maintain and end their perceived connection to one another. The paper reports on one aspect of engagement among human interactors--the effect of tracking faces during an interaction. It also describes the architecture of a robot that can participate in conversational, collaborative interactions with engagement gestures. Finally, the paper reports on findings of experiments with human participants who interacted with a robot when it either performed or did not perform engagement gestures. Results of the human-robot studies indicate that people become engaged with robots: they direct their attention to the robot more often in interactions where engagement gestures are present, and they find interactions more appropriate when engagement gestures are present than when they are not.Comment: 31 pages, 5 figures, 3 table

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

Multichannel Attention Network for Analyzing Visual Behavior in Public Speaking

Author: Guha Tanaya
Sharma Gaurav
Sharma Rahul
Publication venue
Publication date: 21/07/2017
Field of study

Public speaking is an important aspect of human communication and interaction. The majority of computational work on public speaking concentrates on analyzing the spoken content, and the verbal behavior of the speakers. While the success of public speaking largely depends on the content of the talk, and the verbal behavior, non-verbal (visual) cues, such as gestures and physical appearance also play a significant role. This paper investigates the importance of visual cues by estimating their contribution towards predicting the popularity of a public lecture. For this purpose, we constructed a large database of more than

1800

TED talk videos. As a measure of popularity of the TED talks, we leverage the corresponding (online) viewers' ratings from YouTube. Visual cues related to facial and physical appearance, facial expressions, and pose variations are extracted from the video frames using convolutional neural network (CNN) models. Thereafter, an attention-based long short-term memory (LSTM) network is proposed to predict the video popularity from the sequence of visual features. The proposed network achieves state-of-the-art prediction accuracy indicating that visual cues alone contain highly predictive information about the popularity of a talk. Furthermore, our network learns a human-like attention mechanism, which is particularly useful for interpretability, i.e. how attention varies with time, and across different visual cues by indicating their relative importance

arXiv.org e-Print Archive

Crossref

Warwick Research Archives Portal Repository

What speakers do and what addressees look at: Visual attention to gestures in human interaction live and on video

Author: Gullberg Marianne
Holmqvist Kenneth
Publication venue: 'John Benjamins Publishing Company'
Publication date: 01/01/2006
Field of study

This study investigates whether addressees visually attend to speakers’ gestures in interaction and whether attention is modulated by changes in social setting and display size. We compare a live face-to-face setting to two video conditions. In all conditions, the face dominates as a fixation target and only a minority of gestures draw fixations. The social and size parameters affect gaze mainly when combined and in the opposite direction from the predicted with fewer gestures fixated on video than live. Gestural holds and speakers’ gaze at their own gestures reliably attract addressees’ fixations in all conditions. The attraction force of holds is unaffected by changes in social and size parameters, suggesting a bottom-up response, whereas speaker-fixated gestures draw significantly less attention in both video conditions, suggesting a social effect for overt gaze-following and visual joint attention. The study provides and validates a video-based paradigm enabling further experimental but ecologically valid explorations of cross-modal information processing

Lund University Publications

MPG.PuRe

GazeDrone: Mobile Eye-Based Interaction in Public Space Without Augmenting the User

Author: Amos B.
Camera H.
Drewes H.
Majaranta P.
Plus H.
Risko E. F.
Sugioka A.
Telecommunications N.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

Gaze interaction holds a lot of promise for seamless human-computer interaction. At the same time, current wearable mobile eye trackers require user augmentation that negatively impacts natural user behavior while remote trackers require users to position themselves within a confined tracking range. We present GazeDrone, the first system that combines a camera-equipped aerial drone with a computational method to detect sidelong glances for spontaneous (calibration-free) gaze-based interaction with surrounding pervasive systems (e.g., public displays). GazeDrone does not require augmenting each user with on-body sensors and allows interaction from arbitrary positions, even while moving. We demonstrate that drone-supported gaze interaction is feasible and accurate for certain movement types. It is well-perceived by users, in particular while interacting from a fixed position as well as while moving orthogonally or diagonally to a display. We present design implications and discuss opportunities and challenges for drone-supported gaze interaction in public

Crossref

Enlighten

MPG.PuRe

The role of beat gesture and pitch accent in semantic processing : An ERP study

Author: Chu Mingyuan
Wang Lin
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

Peer reviewedPublisher PD

Aberdeen University Research

Institute of Psychology,Chinese Academy Of Sciences

MPG.PuRe

Seeing touches early in life

Author: A Kurjak
A Lew
A Montagu
A Rossetti
AL Woodward
AL Woodward
AN Meltzoff
C Keysers
C Keysers
Chiara Turati
CV Macchi
E Kuehn
E Longhi
E Nagy
E Valenza
Elena Longhi
G Butterworth
I Bufalari
Irene Senna
JW Sparling
JW Sparling
K Meyer
L Craighero
LB Cohen
LB Cohen
LE Bahrick
M Myowa-Yamakoshi
M Schaefer
Marcello Costantini
Margaret Addabbo
MH Johnson
MJ Hertenstein
ML Filippetti
N Bolognini
N Bolognini
N Bolognini
N Bolognini
N Reissland
N Zmyj
Nadia Bolognini
P Rochat
Paolo Tagliabue
R Saxe
S Biro
S Zoia
SJ Blakemore
SJ Ebisch
SJ Ebisch
T Field
U Castiello
V Macchi Cassia
Viola Macchi Cassia
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2015
Field of study

Addabbo M, Longhi E, Bolognini N, et al. Seeing touches early in life. PLoS ONE. 2015;10(9): e0134549

Crossref

Directory of Open Access Journals

PubMed Central

Publications at Bielefeld University

FigShare

Robust Modeling of Epistemic Mental States

Author: Anam ASM Iftekhar
Rahman AKMMahbubur
Yeasin Mohammed
Publication venue
Publication date: 28/05/2020
Field of study

This work identifies and advances some research challenges in the analysis of facial features and their temporal dynamics with epistemic mental states in dyadic conversations. Epistemic states are: Agreement, Concentration, Thoughtful, Certain, and Interest. In this paper, we perform a number of statistical analyses and simulations to identify the relationship between facial features and epistemic states. Non-linear relations are found to be more prevalent, while temporal features derived from original facial features have demonstrated a strong correlation with intensity changes. Then, we propose a novel prediction framework that takes facial features and their nonlinear relation scores as input and predict different epistemic states in videos. The prediction of epistemic states is boosted when the classification of emotion changing regions such as rising, falling, or steady-state are incorporated with the temporal features. The proposed predictive models can predict the epistemic states with significantly improved accuracy: correlation coefficient (CoERR) for Agreement is 0.827, for Concentration 0.901, for Thoughtful 0.794, for Certain 0.854, and for Interest 0.913.Comment: Accepted for Publication in Multimedia Tools and Application, Special Issue: Socio-Affective Technologie

arXiv.org e-Print Archive

University of Memphis Digital Commons

EyeScout: Active Eye Tracking for Position and Movement Independent Gaze Interaction with Large Public Displays

Author: Alt Florian
Bulling Andreas
Hoesl Axel
Khamis Mohamed
Klimczak Alexander
Reiss Martin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/10/2017
Field of study

While gaze holds a lot of promise for hands-free interaction with public displays, remote eye trackers with their confined tracking box restrict users to a single stationary position in front of the display. We present EyeScout, an active eye tracking system that combines an eye tracker mounted on a rail system with a computational method to automatically detect and align the tracker with the user's lateral movement. EyeScout addresses key limitations of current gaze-enabled large public displays by offering two novel gaze-interaction modes for a single user: In "Walk then Interact" the user can walk up to an arbitrary position in front of the display and interact, while in "Walk and Interact" the user can interact even while on the move. We report on a user study that shows that EyeScout is well perceived by users, extends a public display's sweet spot into a sweet line, and reduces gaze interaction kick-off time to 3.5 seconds -- a 62% improvement over state of the art solutions. We discuss sample applications that demonstrate how EyeScout can enable position and movement-independent gaze interaction with large public displays

Crossref

Enlighten