Search CORE

2,357 research outputs found

The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism

Author: Batliner Anton
Chetouani Mohamed
Eyben Florian
Kim Samuel
Marchi Erik
Mortillaro Marcello
Polychroniou Anna
Ringeval Fabien
Salamin Hugues
Scherer Klaus
Schuller Björn
Steidl Stefan
Valente Fabio
Vinciarelli Alessandro
Weninger Felix
Publication venue
Publication date: 01/01/2013
Field of study

The INTERSPEECH 2013 Computational Paralinguistics Challenge provides for the first time a unified test-bed for Social Signals such as laughter in speech. It further introduces conflict in group discussions as new tasks and picks up on autism and its manifestations in speech. Finally, emotion is revisited as task, albeit with a broader ranger of overall twelve emotional states. In this paper, we describe these four Sub-Challenges, Challenge conditions, baselines, and a new feature set by the openSMILE toolkit, provided to the participants. \em Bj\"orn Schuller

^1

, Stefan Steidl

^2

, Anton Batliner

^1

, Alessandro Vinciarelli

^{3,4}

, Klaus Scherer

^5

}\\ {\em Fabien Ringeval

^6

, Mohamed Chetouani

^7

, Felix Weninger

^1

, Florian Eyben

^1

, Erik Marchi

^1

, }\\ {\em Hugues Salamin

^3

, Anna Polychroniou

^3

, Fabio Valente

^4

, Samuel Kim

^4

CiteSeerX

Hal - Université Grenoble Alpes

Enlighten

Hal-Diderot

Archive ouverte UNIGE

The MULAI Corpus: Multimodal Recordings of Spontaneous Laughter in Dyadic Interaction

Author: Englebienne Gwenn
Heylen Dirk K.J.
Jansen Michel-Pierre
Nazareth Deniece Saniah
Truong Khiet Phuong
Publication venue
Publication date: 17/09/2018
Field of study

University of Twente Research Information

Reactive Statistical Mapping: Towards the Sketching of Performative Control with Data

Author: Astrinaki Maria
Babacan Onur
Barbulescu Adela
Cakmak Huseyin
Dall Rasmus
d’Alessandro Nicolas
Hu Qiong
Hueber Thomas
Huguenin Victor
Kalaycı Emine Sümeyye
Moinet Alexis
Parfait Valentin
Ravet Thierry
Tilmanne Joëlle
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/07/2013
Field of study

Part 1: Fundamental IssuesInternational audienceThis paper presents the results of our participation to the ninth eNTERFACE workshop on multimodal user interfaces. Our target for this workshop was to bring some technologies currently used in speech recognition and synthesis to a new level, i.e. being the core of a new HMM-based mapping system. The idea of statistical mapping has been investigated, more precisely how to use Gaussian Mixture Models and Hidden Markov Models for realtime and reactive generation of new trajectories from inputted labels and for realtime regression in a continuous-to-continuous use case. As a result, we have developed several proofs of concept, including an incremental speech synthesiser, a software for exploring stylistic spaces for gait and facial motion in realtime, a reactive audiovisual laughter and a prototype demonstrating the realtime reconstruction of lower body gait motion strictly from upper body motion, with conservation of the stylistic properties. This project has been the opportunity to formalise HMM-based mapping, integrate various of these innovations into the Mage library and explore the development of a realtime gesture recognition tool

CiteSeerX

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Fusion for Audio-Visual Laughter Detection

Author: Reuderink B.
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2007
Field of study

Laughter is a highly variable signal, and can express a spectrum of emotions. This makes the automatic detection of laughter a challenging but interesting task. We perform automatic laughter detection using audio-visual data from the AMI Meeting Corpus. Audio-visual laughter detection is performed by combining (fusing) the results of a separate audio and video classifier on the decision level. The video-classifier uses features based on the principal components of 20 tracked facial points, for audio we use the commonly used PLP and RASTA-PLP features. Our results indicate that RASTA-PLP features outperform PLP features for laughter detection in audio. We compared hidden Markov models (HMMs), Gaussian mixture models (GMMs) and support vector machines (SVM) based classifiers, and found that RASTA-PLP combined with a GMM resulted in the best performance for the audio modality. The video features classified using a SVM resulted in the best single-modality performance. Fusion on the decision-level resulted in laughter detection with a significantly better performance than single-modality classification

University of Twente Research Information

Using the Bag-of-Audio-Word Feature Representation of ASR DNN Posteriors for Paralinguistic Classification

Author: Gosztolya Gábor
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2019
Field of study

Crossref

SZTE Publicatio Repozitórium - SZTE - Repository of Publications

Repository of the Academy's Library

Laugh machine

Author: Bantegnie Emeline
Baur Tobias
Berthouze Nadia
Cakmak Hüseyin
Cruz Richard Thomas
Dupont Stephane
Geist Matthieu
Griffin Harry
Hofmann Jennifer
Lingenfelser Florian
Mancini Maurizio
McKeown Gary
Miranda Miguel
Niewiadomski Radoslaw
Pammi Sathish
Pietquin Olivier
Piot Bilal
Platt Tracey
Ruch Willibald
Sharma Abhishek
Urbain Jerôme
Volpe Gualtiero
Wagner Johannes
Publication venue: Supelec
Publication date: 01/01/2012
Field of study

The Laugh Machine project aims at endowing virtual agents with the capability to laugh naturally, at the right moment and with the correct intensity, when interacting with human participants. In this report we present the technical development and evaluation of such an agent in one specific scenario: watching TV along with a participant. The agent must be able to react to both, the video and the participant’s behaviour. A full processing chain has been implemented, inte- grating components to sense the human behaviours, decide when and how to laugh and, finally, synthesize audiovisual laughter animations. The system was evaluated in its capability to enhance the affective experience of naive participants, with the help of pre and post-experiment questionnaires. Three interaction conditions have been compared: laughter-enabled or not, reacting to the participant’s behaviour or not. Preliminary results (the number of experiments is currently to small to obtain statistically significant differences) show that the interactive, laughter-enabled agent is positively perceived and is increasing the emotional dimension of the experiment

ZORA

JVNV: A Corpus of Japanese Emotional Speech with Verbal Content and Nonverbal Expressions

Author: Aizawa Akiko
Jiang Junfeng
Saito Yuki
Saruwatari Hiroshi
Takamichi Shinnosuke
Xin Detai
Publication venue
Publication date: 09/10/2023
Field of study

We present the JVNV, a Japanese emotional speech corpus with verbal content and nonverbal vocalizations whose scripts are generated by a large-scale language model. Existing emotional speech corpora lack not only proper emotional scripts but also nonverbal vocalizations (NVs) that are essential expressions in spoken language to express emotions. We propose an automatic script generation method to produce emotional scripts by providing seed words with sentiment polarity and phrases of nonverbal vocalizations to ChatGPT using prompt engineering. We select 514 scripts with balanced phoneme coverage from the generated candidate scripts with the assistance of emotion confidence scores and language fluency scores. We demonstrate the effectiveness of JVNV by showing that JVNV has better phoneme coverage and emotion recognizability than previous Japanese emotional speech corpora. We then benchmark JVNV on emotional text-to-speech synthesis using discrete codes to represent NVs. We show that there still exists a gap between the performance of synthesizing read-aloud speech and emotional speech, and adding NVs in the speech makes the task even harder, which brings new challenges for this task and makes JVNV a valuable resource for relevant works in the future. To our best knowledge, JVNV is the first speech corpus that generates scripts automatically using large language models

arXiv.org e-Print Archive