6 research outputs found
Speech features for discriminating stress using branch and bound wrapper search
Stress detection from speech is a less explored field than Automatic Emotion Recognition and it is still not clear which features are
better stress discriminants. VOCE aims at doing speech classification
as stressed or not-stressed in real-time, using acoustic-prosodic features
only. We therefore look for the best discriminating feature subsets from
a set of 6285 features – 6125 features extracted with openSMILE toolkit
and 160 Teager Energy Operator (TEO) features. We use a mutual information filter and a branch and bound wrapper heuristic with an SVM
classifier to perform feature selection. Since many feature sets are selected, we analyse them in terms of chosen features and classifier performance concerning also true positive and false positive rates. The results
show that the best feature types for our application case are Audio Spectral, MFCC, PCM and TEO. We reached results as high as 70.36% for
generalisation accuracyinfo:eu-repo/semantics/acceptedVersio
Speech Features for Discriminating Stress Using Branch and Bound Wrapper Search
Stress detection from speech is a less explored field than Automatic Emotion Recognition and it is still not clear which features are better stress discriminants. VOCE aims at doing speech classification as stressed or not-stressed in real-time, using acoustic-prosodic features only. We therefore look for the best discriminating feature subsets from a set of 6285 features – 6125 features extracted with openSMILE toolkit and 160 Teager Energy Operator (TEO) features. We use a mutual information filter and a branch and bound wrapper heuristic with an SVM classifier to perform feature selection. Since many feature sets are selected, we analyse them in terms of chosen features and classifier performance concerning also true positive and false positive rates. The results show that the best feature types for our application case are Audio Spectral, MFCC, PCM and TEO. We reached results as high as 70.36% for generalisation accuracyinfo:eu-repo/semantics/publishedVersio
Uma abordagem de aprendizagem semissupervisionada para a classificação automática de personalidade baseada em pistas acústico-prosódicas
Automatic personality analysis has gained great attention in the last years as a fundamental dimension in human-machine interactions. However, the development of this technology in some domains, such as the classification of children’s personality, has been hindered by the limited number and size of the available speech corpora due to ethical concerns on collecting such corpora. To circumvent the lack of data, we have investigated the application of a semi-supervised training approach that makes use of heterogeneous (age and language mismatches) and partially non-labelled data sets. Namely, preliminary personality models trained using a small labelled data set with French speaking adults are iteratively refined using a larger unlabeled set of Portuguese children’s speech, whereas a labelled corpus of Portuguese children is used for evaluation. We also investigated speech representations based on prior linguistic knowledge on acoustic-prosodic clues for personality classification tasks and have analysed their relevance in the assessment of each personality trait. The results point out to the potential of applying semi-supervised learning approaches with heterogeneous data sets to overcome the lack of labelled data in under-resourced domains, and to the existence of acousticprosodic clues shared by speakers with different languages and ages, which allows for the classification of personality independently of these variables.info:eu-repo/semantics/acceptedVersio
Voice Analysis for Stress Detection and Application in Virtual Reality to Improve Public Speaking in Real-time: A Review
Stress during public speaking is common and adversely affects performance and
self-confidence. Extensive research has been carried out to develop various
models to recognize emotional states. However, minimal research has been
conducted to detect stress during public speaking in real time using voice
analysis. In this context, the current review showed that the application of
algorithms was not properly explored and helped identify the main obstacles in
creating a suitable testing environment while accounting for current
complexities and limitations. In this paper, we present our main idea and
propose a stress detection computational algorithmic model that could be
integrated into a Virtual Reality (VR) application to create an intelligent
virtual audience for improving public speaking skills. The developed model,
when integrated with VR, will be able to detect excessive stress in real time
by analysing voice features correlated to physiological parameters indicative
of stress and help users gradually control excessive stress and improve public
speaking performanceComment: 41 pages, 7 figures, 4 table
Prosódia, variação e processamento automático
Neste capÃtulo apresentamos um olhar panorâmico sobre a variação prosódica e sobre a sua interface com a área do processamento automático de fala. Tendo por base essencialmente a investigação que tem sido desenvolvida no português europeu sobre corpora de fala espontânea e preparada, em contextos de exposição e de interação, nomeadamente na variedade padrão falada em Lisboa, analisamos a variação da entoação em contextos declarativos e interrogativos, e abordamos as funções pragmáticodiscursivas que podem associar-se também a outros parâmetros prosódicos. Partindo de estudos comparativos inter-estilos (com maior/menor grau de espontaneidade e de planeamento, e natureza mais interativa/expositiva) e inter-falantes (espaço geográfico, género, grupo etário/estatuto), destacamos o papel da variação estilÃstica e sociolinguÃstica da prosódia no português europeu. Mostramos também o papel da variação no processamento automático de proeminência prosódica, pontuação, disfluências e emoções.info:eu-repo/semantics/publishedVersio