802 research outputs found
Feature Learning from Spectrograms for Assessment of Personality Traits
Several methods have recently been proposed to analyze speech and
automatically infer the personality of the speaker. These methods often rely on
prosodic and other hand crafted speech processing features extracted with
off-the-shelf toolboxes. To achieve high accuracy, numerous features are
typically extracted using complex and highly parameterized algorithms. In this
paper, a new method based on feature learning and spectrogram analysis is
proposed to simplify the feature extraction process while maintaining a high
level of accuracy. The proposed method learns a dictionary of discriminant
features from patches extracted in the spectrogram representations of training
speech segments. Each speech segment is then encoded using the dictionary, and
the resulting feature set is used to perform classification of personality
traits. Experiments indicate that the proposed method achieves state-of-the-art
results with a significant reduction in complexity when compared to the most
recent reference methods. The number of features, and difficulties linked to
the feature extraction process are greatly reduced as only one type of
descriptors is used, for which the 6 parameters can be tuned automatically. In
contrast, the simplest reference method uses 4 types of descriptors to which 6
functionals are applied, resulting in over 20 parameters to be tuned.Comment: 12 pages, 3 figure
The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism
The INTERSPEECH 2013 Computational Paralinguistics Challenge provides for the first time a unified test-bed for Social Signals such as laughter in speech. It further introduces conflict in group discussions as new tasks and picks up on autism and its manifestations in speech. Finally, emotion is revisited as task, albeit with a broader ranger of overall twelve emotional states. In this paper, we describe these four Sub-Challenges, Challenge conditions, baselines, and a new feature set by the openSMILE toolkit, provided to the participants.
\em Bj\"orn Schuller, Stefan Steidl, Anton Batliner, Alessandro Vinciarelli, Klaus Scherer}\\
{\em Fabien Ringeval, Mohamed Chetouani, Felix Weninger, Florian Eyben, Erik Marchi, }\\
{\em Hugues Salamin, Anna Polychroniou, Fabio Valente, Samuel Kim
In search of the role’s footprints in client-therapist dialogues
The goal of this research is to identify speaker's role via machine learning of broad acoustic parameters, in order to understand how an occupation, or a role, affects voice characteristics. The examined corpus consists of recordings taken under the same psychological paradigm (Process Work). Four interns were involved in four genuine client-therapist treatment sessions, where each individual had to train her therapeutic skills on her colleague that, in her turn, participated as a client. This uniform setting provided a unique opportunity to examine how role affects speaker's prosody. By a collection of machine learning algorithms, we tested automatic classification of the role across sessions. Results based on the acoustic properties show high classification rates, suggesting that there are discriminative acoustic features of speaker's role, as either a therapist or a client.info:eu-repo/semantics/publishedVersio
Corrective Focus Detection in Italian Speech Using Neural Networks
The corrective focus is a particular kind of prosodic prominence where the speaker is intended to correct or to emphasize a concept. This work develops an Artificial Cognitive System (ACS) based on Recurrent Neural Networks that analyzes suitablefeatures of the audio channel in order to automatically identify the Corrective Focus on speech signals. Two different approaches to build the ACS have been developed. The first one addresses the detection of focused syllables within a given Intonational Unit whereas the second one identifies a whole IU as focused or not. The experimental evaluation over an Italian Corpus has shown the ability of the Artificial Cognitive System to identify the focus in the speaker IUs. This ability can lead to further important improvements in human-machine communication. The addressed problem is a good example of synergies between Humans and Artificial Cognitive Systems.The research leading to the results in this paper has been conducted in the project EMPATHIC (Grant N: 769872) that received funding from the European Union’s Horizon2020 research and innovation programme.Additionally, this work has been partially funded by the Spanish Minister of Science under grants TIN2014-54288-C4-4-R and TIN2017-85854-C4-3-R, by the Basque Government under grant PRE_2017_1_0357,andby the University of the Basque Country UPV/EHU under grantPIF17/310
On Automatic Diagnosis of Alzheimer's Disease based on Spontaneous Speech Analysis and Emotional Temperature
Alzheimer's disease is the most prevalent form of progressive degenerative dementia; it has a high socio-economic impact in Western countries. Therefore it is one of the most active research areas today. Alzheimer's is sometimes diagnosed by excluding other dementias, and definitive confirmation is only obtained through a post-mortem study of the brain tissue of the patient. The work presented here is part of a larger study that aims to identify novel technologies and biomarkers for early Alzheimer's disease detection, and it focuses on evaluating the suitability of a new approach for early diagnosis of Alzheimer’s disease by non-invasive methods. The purpose is to examine, in a pilot study, the potential of applying Machine Learning algorithms to speech features obtained from suspected Alzheimer sufferers in order help diagnose this disease and determine its degree of severity. Two human capabilities relevant in communication have been analyzed for feature selection: Spontaneous Speech and Emotional Response. The experimental results obtained were very satisfactory and promising for the early diagnosis and classification of Alzheimer’s disease patients
- …