Search CORE

7 research outputs found

Is Everything Fine, Grandma? Acoustic and Linguistic Modeling for Robust Elderly Speech Emotion Recognition

Author: Cadèe Tobias
Fedotov Dmitrii
Karpov Alexey
Kaya Heysem
Salah Albert Ali
Soğancıoğlu Gizem
Verkholyak Oxana
Publication venue
Publication date: 07/09/2020
Field of study

Acoustic and linguistic analysis for elderly emotion recognition is an under-studied and challenging research direction, but essential for the creation of digital assistants for the elderly, as well as unobtrusive telemonitoring of elderly in their residences for mental healthcare purposes. This paper presents our contribution to the INTERSPEECH 2020 Computational Paralinguistics Challenge (ComParE) - Elderly Emotion Sub-Challenge, which is comprised of two ternary classification tasks for arousal and valence recognition. We propose a bi-modal framework, where these tasks are modeled using state-of-the-art acoustic and linguistic features, respectively. In this study, we demonstrate that exploiting task-specific dictionaries and resources can boost the performance of linguistic models, when the amount of labeled data is small. Observing a high mismatch between development and test set performances of various models, we also propose alternative training and decision fusion strategies to better estimate and improve the generalization performance.Comment: 5 pages, 1 figure, Interspeech 202

arXiv.org e-Print Archive

Utrecht University Repository

Mobile Phones and Social Signal Processing for Analysis and Understanding of Dyadic Conversations

Author: A. Vinciarelli
A. Vinciarelli
A. Vinciarelli
C. Licoppe
D. Figo
D. Olguin Olguin
D. Ververidis
E.A. Schegloff
I. Arminen
J. Kela
J.-A. Bachorowski
J.A. Russell
J.F. Knight
J.R. Curhan
J.S. Uleman
K.R. Scherer
L. Fortunati
M. Kosinski
M. Raento
M.R. Mehl
M.R. Mehl
N. Ambady
N. Eagle
P. Dourish
S. Robinson
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Social Signal Processing is the domain aimed at bridging the social intelligence gap between humans and machines via modeling, analysis and synthesis of nonverbal behavior in social interactions. One of the main challenges of the domain is to sense unobtrusively the behavior of social interaction participants, one of the key conditions to preserve the spontaneity and naturalness of the interactions under exam. In this respect, mobile devices offer a major opportunity because they are equipped with a wide array of sensors that, while capturing the behavior of their users with an unprecedented depth, are still invisible. This is particularly important because mobile devices are part of the everyday life of a large number of individuals and, hence, they can be used to investigate and sense natural and spontaneous scenarios

Crossref

Enlighten

Non-verbal Signals in Oral History Archives

Author: Pessanha Francisca
Publication venue
Publication date: 07/11/2022
Field of study

Oral History Archives (OHA) are a rich source of emotional narratives, encapsulating the personal stories of people across different demographics, historical periods, and cultures. Computational technologies have transformed the oral history archival field by facilitating the transcription and verbal content analysis of interview collections where manual inspection is too time-consuming. However, these methods fail to include the subjective part of the archives. In this project, we explore the potential of automatic breathing patterns and non-verbal cues analysis applied to OHA interviews to gain new insights into the individual and collective emotional responses across different demographics. The proposed framework will investigate if automatic breathing signal prediction enhances the performance of speech emotion recognition models and if a cross-dataset learning approach for breathing signal prediction and paralinguistics analysis will work in OHA. Next, we will further use the emotional information gathered to study cultural differences when it comes to narrating traumatic experiences, focusing on different OHA collections. Lastly, to enhance our research and the literature, we will also design emotion elicitation experiments to create new emotional speech breathing datasets

Utrecht University Repository

Non-Native Differences in Prosodic-Construction Use

Author: Gallardo Paola
Ward Nigel G.
Publication venue: University of Illinois at Chicago Library
Publication date: 20/01/2017
Field of study

Many language learners never acquire truly native-sounding prosody. Previous work has suggested that this involves skill deficits in the dialog-related uses of prosody, and may be attributable to weaknesses with specific prosodic constructions. Using semi-automated methods, we identified 32 of the most common prosodic constructions in English dialog. Examining 90 minutes of six advanced native-Spanish learners conversing in English, there were differences, notably regarding swift turn-taking, alignment, and empathy, but overall their uses of prosodic constructions were largely similar to those of native speakers

University of Illinois at Chicago: Journals@UIC

Computational modeling of turn-taking dynamics in spoken conversations

Author: Chowdhury Shammur Absar
Publication venue: University of Trento
Publication date: 11/04/2017
Field of study

The study of human interaction dynamics has been at the center for multiple research disciplines in- cluding computer and social sciences, conversational analysis and psychology, for over decades. Recent interest has been shown with the aim of designing computational models to improve human-machine interaction system as well as support humans in their decision-making process. Turn-taking is one of the key aspects of conversational dynamics in dyadic conversations and is an integral part of human- human, and human-machine interaction systems. It is used for discourse organization of a conversation by means of explicit phrasing, intonation, and pausing, and it involves intricate timing. In verbal (e.g., telephone) conversation, the turn transitions are facilitated by inter- and intra- speaker silences and over- laps. In early research of turn-taking in the speech community, the studies include durational aspects of turns, cues for turn yielding intention and lastly designing turn transition modeling for spoken dia- log agents. Compared to the studies of turn transitions very few works have been done for classifying overlap discourse, especially the competitive act of overlaps and function of silences. Given the limitations of the current state-of-the-art, this dissertation focuses on two aspects of con- versational dynamics: 1) design automated computational models for analyzing turn-taking behavior in a dyadic conversation, 2) predict the outcome of the conversations, i.e., observed user satisfaction, using turn-taking descriptors, and later these two aspects are used to design a conversational profile for each speaker using turn-taking behavior and the outcome of the conversations. The analysis, experiments, and evaluation has been done on a large dataset of Italian call-center spoken conversations where customers and agents are engaged in real problem-solving tasks. Towards solving our research goal, the challenges include automatically segmenting and aligning speakers’ channel from the speech signal, identifying and labeling the turn-types and its functional aspects. The task becomes more challenging due to the presence of overlapping speech. To model turn- taking behavior, the intension behind these overlapping turns needed to be considered. However, among all, the most critical question is how to model observed user satisfaction in a dyadic conversation and what properties of turn-taking behavior can be used to represent and predict the outcome. Thus, the computational models for analyzing turn-taking dynamics, in this dissertation includes au- tomatic segmenting and labeling turn types, categorization of competitive vs non-competitive overlaps, silences (e.g., lapse, pauses) and functions of turns in terms of dialog acts. The novel contributions of the work presented here are to 1. design of a fully automated turn segmentation and labeling (e.g., agent vs customer’s turn, lapse within the speaker, and overlap) system. 2. the design of annotation guidelines for segmenting and annotating the speech overlaps with the competitive and non-competitive labels. 3. demonstrate how different channels of information such as acoustic, linguistic, and psycholin- guistic feature sets perform in the classification of competitive vs non-competitive overlaps. 4. study the role of speakers and context (i.e., agents’ and customers’ speech) for conveying the information of competitiveness for each individual feature set and their combinations. 5. investigate the function of long silences towards the information flow in a dyadic conversation. The extracted turn-taking cues is then used to automatically predict the outcome of the conversation, which is modeled from continuous manifestations of emotion. The contributions include 1. modeling the state of the observed user satisfaction in terms of the final emotional manifestation of the customer (i.e., user). 2. analysis and modeling turn-taking properties to display how each turn type influence the user satisfaction. 3. study of how turn-taking behavior changes within each emotional state. Based on the studies conducted in this work, it is demonstrated that turn-taking behavior, specially competitiveness of overlaps, is more than just an organizational tool in daily human interactions. It represents the beneficial information and contains the power to predict the outcome of the conversation in terms of satisfaction vs not-satisfaction. Combining the turn-taking behavior and the outcome of the conversation, the final and resultant goal is to design a conversational profile for each speaker. Such profiled information not only facilitate domain experts but also would be useful to the call center agent in real time. These systems are fully automated and no human intervention is required. The findings are po- tentially relevant to the research of overlapping speech and automatic analysis of human-human and human-machine interactions

Unitn-eprints PhD

Tagungsband der 12. Tagung Phonetik und Phonologie im deutschsprachigen Raum

Author
Publication venue
Publication date: 01/10/2016
Field of study

Voice and speech analysis in search of states and traits

Author: Schuller Björn
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 12/05/2020
Field of study

OPUS Augsburg