Search CORE

73 research outputs found

Speech features for discriminating stress using branch and bound wrapper search

Author: Aguiar A.
Batista F.
Julião M.
Moniz H.
Silva J.
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2015
Field of study

Stress detection from speech is a less explored field than Automatic Emotion Recognition and it is still not clear which features are better stress discriminants. VOCE aims at doing speech classification as stressed or not-stressed in real-time, using acoustic-prosodic features only. We therefore look for the best discriminating feature subsets from a set of 6285 features – 6125 features extracted with openSMILE toolkit and 160 Teager Energy Operator (TEO) features. We use a mutual information filter and a branch and bound wrapper heuristic with an SVM classifier to perform feature selection. Since many feature sets are selected, we analyse them in terms of chosen features and classifier performance concerning also true positive and false positive rates. The results show that the best feature types for our application case are Audio Spectral, MFCC, PCM and TEO. We reached results as high as 70.36% for generalisation accuracyinfo:eu-repo/semantics/acceptedVersio

Speech Features for Discriminating Stress Using Branch and Bound Wrapper Search

Author: A Aguiar
B Schuller
DH Wolpert
F Batista
F Jabloun
G Zhou
I Guyon
J Wells
MT Allen
R Fernandez
T Vogt
TC Miller
V Kumar
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2015
Field of study

Universidade de Lisboa: Repositório.UL

Voice Analysis for Stress Detection and Application in Virtual Reality to Improve Public Speaking in Real-time: A Review

Author: Arushi
Dillon Denise
Dillon Roberto
Teoh Ai Ni
Publication venue
Publication date: 31/07/2022
Field of study

Stress during public speaking is common and adversely affects performance and self-confidence. Extensive research has been carried out to develop various models to recognize emotional states. However, minimal research has been conducted to detect stress during public speaking in real time using voice analysis. In this context, the current review showed that the application of algorithms was not properly explored and helped identify the main obstacles in creating a suitable testing environment while accounting for current complexities and limitations. In this paper, we present our main idea and propose a stress detection computational algorithmic model that could be integrated into a Virtual Reality (VR) application to create an intelligent virtual audience for improving public speaking skills. The developed model, when integrated with VR, will be able to detect excessive stress in real time by analysing voice features correlated to physiological parameters indicative of stress and help users gradually control excessive stress and improve public speaking performanceComment: 41 pages, 7 figures, 4 table

arXiv.org e-Print Archive

Physiologically-Motivated Feature Extraction Methods for Speaker Recognition

Author: Wang Jianglin
Publication venue: e-Publications@Marquette
Publication date: 01/10/2013
Field of study

Speaker recognition has received a great deal of attention from the speech community, and significant gains in robustness and accuracy have been obtained over the past decade. However, the features used for identification are still primarily representations of overall spectral characteristics, and thus the models are primarily phonetic in nature, differentiating speakers based on overall pronunciation patterns. This creates difficulties in terms of the amount of enrollment data and complexity of the models required to cover the phonetic space, especially in tasks such as identification where enrollment and testing data may not have similar phonetic coverage. This dissertation introduces new features based on vocal source characteristics intended to capture physiological information related to the laryngeal excitation energy of a speaker. These features, including RPCC, GLFCC and TPCC, represent the unique characteristics of speech production not represented in current state-of-the-art speaker identification systems. The proposed features are evaluated through three experimental paradigms including cross-lingual speaker identification, cross song-type avian speaker identification and mono-lingual speaker identification. The experimental results show that the proposed features provide information about speaker characteristics that is significantly different in nature from the phonetically-focused information present in traditional spectral features. The incorporation of the proposed glottal source features offers significant overall improvement to the robustness and accuracy of speaker identification tasks