Search CORE

56 research outputs found

A survey on perceived speaker traits: personality, likability, pathology, and the first challenge

Author: Batliner Anton
Bocklet Tobias
Burkhardt Felix
Eyben Florian
Mohammadi Gelareh
Noeth Elmar
Schuller Björn
Steidl Stefan
van Son Rob
Vinciarelli Alessandro
Weiss Benjamin
Weninger Felix
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

The INTERSPEECH 2012 Speaker Trait Challenge aimed at a unified test-bed for perceived speaker traits – the first challenge of this kind: personality in the five OCEAN personality dimensions, likability of speakers, and intelligibility of pathologic speakers. In the present article, we give a brief overview of the state-of-the-art in these three fields of research and describe the three sub-challenges in terms of the challenge conditions, the baseline results provided by the organisers, and a new openSMILE feature set, which has been used for computing the baselines and which has been provided to the participants. Furthermore, we summarise the approaches and the results presented by the participants to show the various techniques that are currently applied to solve these classification tasks

Crossref

Enlighten

International Migration, Integration and Social Cohesion online publications

Feature Learning from Spectrograms for Assessment of Personality Traits

Author: Attabi Yazid
Carbonneau Marc-André
Gagnon Ghyslain
Granger Eric
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/10/2016
Field of study

Several methods have recently been proposed to analyze speech and automatically infer the personality of the speaker. These methods often rely on prosodic and other hand crafted speech processing features extracted with off-the-shelf toolboxes. To achieve high accuracy, numerous features are typically extracted using complex and highly parameterized algorithms. In this paper, a new method based on feature learning and spectrogram analysis is proposed to simplify the feature extraction process while maintaining a high level of accuracy. The proposed method learns a dictionary of discriminant features from patches extracted in the spectrogram representations of training speech segments. Each speech segment is then encoded using the dictionary, and the resulting feature set is used to perform classification of personality traits. Experiments indicate that the proposed method achieves state-of-the-art results with a significant reduction in complexity when compared to the most recent reference methods. The number of features, and difficulties linked to the feature extraction process are greatly reduced as only one type of descriptors is used, for which the 6 parameters can be tuned automatically. In contrast, the simplest reference method uses 4 types of descriptors to which 6 functionals are applied, resulting in over 20 parameters to be tuned.Comment: 12 pages, 3 figure

arXiv.org e-Print Archive

Acoustic-prosodic automatic personality trait assessment for adults and children

Author: A Caspi
A Vinciarelli
B Rammstedt
B Roberts
B Schuller
F Batista
F Eyben
F Mairesse
G Mohammadi
H Kelley
J Campos
J Kagan
M Lewis
S Cloninger
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2016
Field of study

This paper investigates the use of heterogeneous speech corpora for automatic assessment of personality traits in terms of the BigFive OCEAN dimensions. The motivation for this work is twofold: the need to develop methods to overcome the lack of children’s speech corpora, particularly severe when targeting personality traits, and the interest on cross-age comparisons of acoustic-prosodic features to build robust paralinguistic detectors. For this purpose, we devise an experimental setup with age mismatch utilizing the Interspeech 2012 Personality Subchallenge, containing adult speech, as training data. As test data, we use a corpus of children’s European Portuguese speech. We investigate various features sets such as the Sub-challenge baseline features, the recently introduced eGeMAPS features and our own knowledge-based features. The preliminary results bring insights into cross-age and -language detection of personality traits in spontaneous speech, pointing out to a stable set of acoustic-prosodic features for Extraversion and Agreeableness in both adult and child speech.info:eu-repo/semantics/publishedVersio

Crossref

Repositório Institucional do ISCTE-IUL

Universidade de Lisboa: Repositório.UL

Simulating dysarthric speech for training data augmentation in clinical speech applications

Author: Berisha Visar
Jiao Yishan
Liss Julie
Tu Ming
Publication venue
Publication date: 26/04/2018
Field of study

Training machine learning algorithms for speech applications requires large, labeled training data sets. This is problematic for clinical applications where obtaining such data is prohibitively expensive because of privacy concerns or lack of access. As a result, clinical speech applications are typically developed using small data sets with only tens of speakers. In this paper, we propose a method for simulating training data for clinical applications by transforming healthy speech to dysarthric speech using adversarial training. We evaluate the efficacy of our approach using both objective and subjective criteria. We present the transformed samples to five experienced speech-language pathologists (SLPs) and ask them to identify the samples as healthy or dysarthric. The results reveal that the SLPs identify the transformed speech as dysarthric 65% of the time. In a pilot classification experiment, we show that by using the simulated speech samples to balance an existing dataset, the classification accuracy improves by about 10% after data augmentation.Comment: Will appear in Proc. of ICASSP 201

arXiv.org e-Print Archive

Crossref

The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism

Author: Batliner Anton
Chetouani Mohamed
Eyben Florian
Kim Samuel
Marchi Erik
Mortillaro Marcello
Polychroniou Anna
Ringeval Fabien
Salamin Hugues
Scherer Klaus
Schuller Björn
Steidl Stefan
Valente Fabio
Vinciarelli Alessandro
Weninger Felix
Publication venue
Publication date: 01/01/2013
Field of study

The INTERSPEECH 2013 Computational Paralinguistics Challenge provides for the first time a unified test-bed for Social Signals such as laughter in speech. It further introduces conflict in group discussions as new tasks and picks up on autism and its manifestations in speech. Finally, emotion is revisited as task, albeit with a broader ranger of overall twelve emotional states. In this paper, we describe these four Sub-Challenges, Challenge conditions, baselines, and a new feature set by the openSMILE toolkit, provided to the participants. \em Bj\"orn Schuller

^1

, Stefan Steidl

^2

, Anton Batliner

^1

, Alessandro Vinciarelli

^{3,4}

, Klaus Scherer

^5

}\\ {\em Fabien Ringeval

^6

, Mohamed Chetouani

^7

, Felix Weninger

^1

, Florian Eyben

^1

, Erik Marchi

^1

, }\\ {\em Hugues Salamin

^3

, Anna Polychroniou

^3

, Fabio Valente

^4

, Samuel Kim

^4

CiteSeerX

Hal - Université Grenoble Alpes

Enlighten

Hal-Diderot

Archive ouverte UNIGE

Affective analysis of customer service calls

Author: Batista F.
Cabarrão V.
Julião M.
Mata Ana I.
Moniz H.
Solera-Ureña R.
Trancoso I.
Publication venue: 'International Society of Experimental Linguistics'
Publication date: 01/01/2019
Field of study

This paper presents an affective and acoustic-prosodic analysis of a call-center corpus (700 phone calls with corresponding customer satisfaction levels). Our main goal is to understand how customers’ satisfaction correlates to the acoustic-prosodic and affective information (emotions and personality traits) of the interactions. A subset of 30 calls was manually annotated with emotions (frustrated vs.neutral) and personality traits (Big-Five model). Results on automatic satisfaction prediction from acoustic-prosodic features show a number of very informative linguistic knowledge-based features, especially pitch and energy ranges. The affective analysis also provides encouraging results, relating low/high satisfaction levels with the presence/absence of customer frustration. Concerning personality, customers tend to express signs of anxiety and nervousness, while agents are generally perceived as extroverted and open.info:eu-repo/semantics/publishedVersio

Crossref

Repositório Institucional do ISCTE-IUL

A Literature Review on Emotion Recognition Using Various Methods

Author: Omar Sharif
Reeshad Khan
Publication venue: Global Journals Inc. (US)
Publication date: 04/04/2017
Field of study

Emotion Recognition is an important area of work to improve the interaction between human and machine. Complexity of emotion makes the acquisition task more difficult. Quondam works are proposed to capture emotion through unimodal mechanism such as only facial expressions or only vocal input. More recently, inception to the idea of multimodal emotion recognition has increased the accuracy rate of the detection of the machine. Moreover, deep learning technique with neural network extended the success ratio of machine in respect of emotion recognition. Recent works with deep learning technique has been performed with different kinds of input of human behavior such as audio-visual inputs, facial expressions, body gestures, EEG signal and related brainwaves. Still many aspects in this area to work on to improve and make a robust system will detect and classify emotions more accurately. In this paper, we tried to explore the relevant significant works, their techniques, and the effectiveness of the methods and the scope of the improvement of the results

Global Journal of Computer Science and Technology (GJCST)

The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing

Author: André Elisabeth
Busso Carlos
Devillers Laurence
Epps Julien
Eyben Florian
Laukka Petri
Narayanan Shrikanth
Scherer Klaus
Schuller Björn
Sundberg Johan
Truong Khiet
Publication venue: IEEE
Publication date: 01/04/2016
Field of study

Work on voice sciences over recent decades has led to a proliferation of acoustic parameters that are used quite selectively and are not always extracted in a similar fashion. With many independent teams working in different research areas, shared standards become an essential safeguard to ensure compliance with state-of-the-art methods allowing appropriate comparison of results across studies and potential integration and combination of extraction and recognition systems. In this paper we propose a basic standard acoustic parameter set for various areas of automatic voice analysis, such as paralinguistic or clinical speech analysis. In contrast to a large brute-force parameter set, we present a minimalistic set of voice parameters here. These were selected based on a) their potential to index affective physiological changes in voice production, b) their proven value in former studies as well as their automatic extractability, and c) their theoretical significance. The set is intended to provide a common baseline for evaluation of future research and eliminate differences caused by varying parameter sets or even different implementations of the same parameters. Our implementation is publicly available with the openSMILE toolkit. Comparative evaluations of the proposed feature set and large baseline feature sets of INTERSPEECH challenges show a high performance of the proposed set in relation to its size

Crossref

University of Twente Research Information