7 research outputs found

    Modelling emotional valence and arousal of non-linguistic utterances for sound design support

    Get PDF
    Non-Linguistic Utterances (NLUs), produced for popular media, computers, robots, and public spaces, can quickly and wordlessly convey emotional characteristics of a message. They have been studied in terms of their ability to convey affect in robot communication. The objective of this research is to develop a model that correctly infers the emotional Valence and Arousal of an NLU. On a Likert scale, 17 subjects evaluated the relative Valence and Arousal of 560 sounds collected from popular movies, TV shows, and video games, including NLUs and other character utterances. Three audio feature sets were used to extract features including spectral energy, spectral spread, zero-crossing rate (ZCR), Mel Frequency Cepstral Coefficients (MFCCs), and audio chroma, as well as pitch, jitter, formant, shimmer, loudness, and Harmonics-to-Noise Ratio, among others. After feature reduction by Factor Analysis, the best-performing models inferred average Valence with a Mean Absolute Error (MAE) of 0.107 and Arousal with MAE of 0.097 on audio samples removed from the training stages. These results suggest the model infers Valence and Arousal of most NLUs to less than the difference between successive rating points on the 7-point Likert scale (0.14). This inference system is applicable to the development of novel NLUs to augment robot-human communication or to the design of sounds for other systems, machines, and settings

    Affect Recognition in Human Emotional Speech using Probabilistic Support Vector Machines

    Get PDF
    The problem of inferring human emotional state automatically from speech has become one of the central problems in Man Machine Interaction (MMI). Though Support Vector Machines (SVMs) were used in several worksfor emotion recognition from speech, the potential of using probabilistic SVMs for this task is not explored. The emphasis of the current work is on how to use probabilistic SVMs for the efficient recognition of emotions from speech. Emotional speech corpuses for two Dravidian languages- Telugu & Tamil- were constructed for assessing the recognition accuracy of Probabilistic SVMs. Recognition accuracy of the proposed model is analyzed using both Telugu and Tamil emotional speech corpuses and compared with three of the existing works. Experimental results indicated that the proposed model is significantly better compared with the existing methods

    Emotion recognition from syllabic units using k-nearest-neighbor classification and energy distribution

    Get PDF
    In this article, we present an automatic technique for recognizing emotional states from speech signals. The main focus of this paper is to present an efficient and reduced set of acoustic features that allows us to recognize the four basic human emotions (anger, sadness, joy, and neutral). The proposed features vector is composed by twenty-eight measurements corresponding to standard acoustic features such as formants, fundamental frequency (obtained by Praat software) as well as introducing new features based on the calculation of the energies in some specific frequency bands and their distributions (thanks to MATLAB codes). The extracted measurements are obtained from syllabic units’ consonant/vowel (CV) derived from Moroccan Arabic dialect emotional database (MADED) corpus. Thereafter, the data which has been collected is then trained by a k-nearest-neighbor (KNN) classifier to perform the automated recognition phase. The results reach 64.65% in the multi-class classification and 94.95% for classification between positive and negative emotions

    Time-Distributed Attention-Layered Convolution Neural Network with Ensemble Learning using Random Forest Classifier for Speech Emotion Recognition

    Get PDF
    Speech Emotion Detection (SER) is a field of identifying human emotions from human speech utterances. Human speech utterances are a combination of linguistic and non-linguistic information. Nonlinguistic SER provides a generalized solution in human–computer interaction applications as it overcomes the language barrier. Machine learning and deep learning techniques were previously proposed for classifying emotions using handpicked features. To achieve effective and generalized SER, feature extraction can be performed using deep neural networks and ensemble learning for classification. The proposed model employed a time-distributed attention-layered convolution neural network (TDACNN) for extracting spatiotemporal features at the first stage and a random forest (RF) classifier, which is an ensemble classifier for efficient and generalized classification of emotions, at the second stage. The proposed model was implemented on the RAVDESS and IEMOCAP data corpora and compared with the CNN-SVM and CNN-RF models for SER. The TDACNN-RF model exhibited test classification accuracies of 92.19 percent and 90.27 percent on the RAVDESS and IEMOCAP data corpora, respectively. The experimental results proved that the proposed model is efficient in extracting spatiotemporal features from time-series speech signals and can classify emotions with good accuracy. The class confusion among the emotions was reduced for both data corpora, proving that the model achieved generalization

    Internet and Biometric Web Based Business Management Decision Support

    Get PDF
    Internet and Biometric Web Based Business Management Decision Support MICROBE MOOC material prepared under IO1/A5 Development of the MICROBE personalized MOOCs content and teaching materials Prepared by: A. Kaklauskas, A. Banaitis, I. Ubarte Vilnius Gediminas Technical University, Lithuania Project No: 2020-1-LT01-KA203-07810

    KEER2022

    Get PDF
    AvanttĂ­tol: KEER2022. DiversitiesDescripciĂł del recurs: 25 juliol 202
    corecore