2,352 research outputs found
Sympathy Begins with a Smile, Intelligence Begins with a Word: Use of Multimodal Features in Spoken Human-Robot Interaction
Recognition of social signals, from human facial expressions or prosody of
speech, is a popular research topic in human-robot interaction studies. There
is also a long line of research in the spoken dialogue community that
investigates user satisfaction in relation to dialogue characteristics.
However, very little research relates a combination of multimodal social
signals and language features detected during spoken face-to-face human-robot
interaction to the resulting user perception of a robot. In this paper we show
how different emotional facial expressions of human users, in combination with
prosodic characteristics of human speech and features of human-robot dialogue,
correlate with users' impressions of the robot after a conversation. We find
that happiness in the user's recognised facial expression strongly correlates
with likeability of a robot, while dialogue-related features (such as number of
human turns or number of sentences per robot utterance) correlate with
perceiving a robot as intelligent. In addition, we show that facial expression,
emotional features, and prosody are better predictors of human ratings related
to perceived robot likeability and anthropomorphism, while linguistic and
non-linguistic features more often predict perceived robot intelligence and
interpretability. As such, these characteristics may in future be used as an
online reward signal for in-situ Reinforcement Learning based adaptive
human-robot dialogue systems.Comment: Robo-NLP workshop at ACL 2017. 9 pages, 5 figures, 6 table
Predicting continuous conflict perception with Bayesian Gaussian processes
Conflict is one of the most important phenomena of social life, but it is still largely neglected by the computing community. This work proposes an approach
that detects common conversational social signals (loudness, overlapping speech,
etc.) and predicts the conflict level perceived by human observers in continuous,
non-categorical terms. The proposed regression approach is fully Bayesian and it
adopts Automatic Relevance Determination to identify the social signals that influence most the outcome of the prediction. The experiments are performed over the SSPNet Conflict Corpus, a publicly available collection of 1430 clips extracted from televised political debates (roughly 12 hours of material for 138 subjects in total). The results show that it is possible to achieve a correlation close to 0.8 between actual and predicted conflict perception
Recommended from our members
Acoustic and Prosodic Correlates of Social Behavior
We describe acoustic/prosodic and lexical correlates of social variables annotated on a large corpus of task-oriented spontaneous speech. We employ Amazon Mechanical Turk to label the corpus with a large number of social behaviors, examining results of three of these here. We find significant differences between male and female speakers for perceptions of attempts to be liked, likeability, speech planning, that also differ depending upon the gender of their conversational partners
A Study of Accomodation of Prosodic and Temporal Features in Spoken Dialogues in View of Speech Technology Applications
Inter-speaker accommodation is a well-known property of human speech and human interaction in general. Broadly it refers to the behavioural patterns of two (or more) interactants and the effect of the (verbal and non-verbal) behaviour of each to that of the other(s). Implementation of thisbehavior in spoken dialogue systems is desirable as an improvement on the naturalness of humanmachine interaction. However, traditional qualitative descriptions of accommodation phenomena do not provide sufficient information for such an implementation. Therefore, a quantitativedescription of inter-speaker accommodation is required. This thesis proposes a methodology of monitoring accommodation during a human or humancomputer dialogue, which utilizes a moving average filter over sequential frames for each speaker. These frames are time-aligned across the speakers, hence the name Time Aligned Moving Average (TAMA). Analysis of spontaneous human dialogue recordings by means of the TAMA methodology reveals ubiquitous accommodation of prosodic features (pitch, intensity and speech rate) across interlocutors, and allows for statistical (time series) modeling of the behaviour, in a way which is meaningful for implementation in spoken dialogue system (SDS) environments.In addition, a novel dialogue representation is proposed that provides an additional point of view to that of TAMA in monitoring accommodation of temporal features (inter-speaker pause length and overlap frequency). This representation is a percentage turn distribution of individual speakercontributions in a dialogue frame which circumvents strict attribution of speaker-turns, by considering both interlocutors as synchronously active. Both TAMA and turn distribution metrics indicate that correlation of average pause length and overlap frequency between speakers can be attributed to accommodation (a debated issue), and point to possible improvements in SDS “turntaking” behaviour. Although the findings of the prosodic and temporal analyses can directly inform SDS implementations, further work is required in order to describe inter-speaker accommodation sufficiently, as well as to develop an adequate testing platform for evaluating the magnitude ofperceived improvement in human-machine interaction. Therefore, this thesis constitutes a first step towards a convincingly useful implementation of accommodation in spoken dialogue systems
Automatic Recognition of Prosodic Patterns in Semantic Verbal Fluency Tests - an Animal Naming Task for Edutainment Applications
This paper automatically detects prosodic patterns in the domain of semantic fluency tests. Verbal fluency tests aim at evaluating the spontaneous production of words under constrained conditions. Mostly used for assessing cognitive impairment, they can be used in a plethora of domains, as edutainment applications or games with educational purposes. This work discriminates between list effects, disfluencies, and other linguistic events in an animal naming task. Recordings from 42 Portuguese speakers were automatically recognized and AuToBI was applied in order to detect prosodic patterns, using both European Portuguese and English models. Both models allowed to differentiate list effects from the other events, mostly represented by the tunes: L* H/L(-%) (English models) or L*+H H/L(-%) (Portuguese models). However, English models proved to be more suitable because they rely in substantial more training material.info:eu-repo/semantics/publishedVersio
- …