173,245 research outputs found

    Speech Based Machine Learning Models for Emotional State Recognition and PTSD Detection

    Get PDF
    Recognition of emotional state and diagnosis of trauma related illnesses such as posttraumatic stress disorder (PTSD) using speech signals have been active research topics over the past decade. A typical emotion recognition system consists of three components: speech segmentation, feature extraction and emotion identification. Various speech features have been developed for emotional state recognition which can be divided into three categories, namely, excitation, vocal tract and prosodic. However, the capabilities of different feature categories and advanced machine learning techniques have not been fully explored for emotion recognition and PTSD diagnosis. For PTSD assessment, clinical diagnosis through structured interviews is a widely accepted means of diagnosis, but patients are often embarrassed to get diagnosed at clinics. The speech signal based system is a recently developed alternative. Unfortunately,PTSD speech corpora are limited in size which presents difficulties in training complex diagnostic models. This dissertation proposed sparse coding methods and deep belief network models for emotional state identification and PTSD diagnosis. It also includes an additional transfer learning strategy for PTSD diagnosis. Deep belief networks are complex models that cannot work with small data like the PTSD speech database. Thus, a transfer learning strategy was adopted to mitigate the small data problem. Transfer learning aims to extract knowledge from one or more source tasks and apply the knowledge to a target task with the intention of improving the learning. It has proved to be useful when the target task has limited high quality training data. We evaluated the proposed methods on the speech under simulated and actual stress database (SUSAS) for emotional state recognition and on two PTSD speech databases for PTSD diagnosis. Experimental results and statistical tests showed that the proposed models outperformed most state-of-the-art methods in the literature and are potentially efficient models for emotional state recognition and PTSD diagnosis

    A vector quantized masked autoencoder for speech emotion recognition

    Full text link
    Recent years have seen remarkable progress in speech emotion recognition (SER), thanks to advances in deep learning techniques. However, the limited availability of labeled data remains a significant challenge in the field. Self-supervised learning has recently emerged as a promising solution to address this challenge. In this paper, we propose the vector quantized masked autoencoder for speech (VQ-MAE-S), a self-supervised model that is fine-tuned to recognize emotions from speech signals. The VQ-MAE-S model is based on a masked autoencoder (MAE) that operates in the discrete latent space of a vector-quantized variational autoencoder. Experimental results show that the proposed VQ-MAE-S model, pre-trained on the VoxCeleb2 dataset and fine-tuned on emotional speech data, outperforms an MAE working on the raw spectrogram representation and other state-of-the-art methods in SER.Comment: https://samsad35.github.io/VQ-MAE-Speech

    Analysis and Annotation of Emotional Traits on Audio Conversations in Real-time

    Get PDF
    It is a challenging task for computers to recognize humans’ emotions through their conversations. Therefore, this research is aimed at analyzing conversation audio data, then labeling humans’ emotions, finally annotating and visualizing the identified emotional traits of audio conversions in real-time. In order to make computer to process speech emotion features, the raw audio is converted from time domain to frequency domain and extract speech emotion by Mel-Frequency Cepstral Coefficients. In terms of speech emotion recognition, deep neural network and extreme learning machine are used to predict emotion traits. Each emotional trait is captured by speech recognition precision. There are four emotional traits which include sadness, happiness, neutral, anger in the dataset. The total precision value of four emotional traits is normalized into 1. In this study, the normalized precision is used as emotional trait relative intensity in which each emotional trait is labeled and displayed along with conversion. For better visualization, a Graphical User Interface is made to display the waveform graph, spectrogram graph and speech emotion prediction graph of a given speech audio. Meanwhile, the effect of voice activity detection algorithm is analyzed in this study. The timestamps for emotion annotation can be obtained by the result of voice activity detection

    Linguistic Based Emotion Detection from Live Social Media Data Classification Using Metaheuristic Deep Learning Techniques

    Get PDF
    A crucial area of research that can reveal numerous useful insights is emotional recognition. Several visible ways, including speech, gestures, written material, and facial expressions, can be used to portray emotion. Natural language processing (NLP) and DL concepts are utilised in the content-based categorization problem that is at the core of emotion recognition in text documents.This research propose novel technique in linguistic based emotion detection by social media using metaheuristic deep learning architectures. Here the input has been collected as live social media data and processed for noise removal, smoothening and dimensionality reduction. Processed data has been extracted and classified using metaheuristic swarm regressive adversarial kernel component analysis. Experimental analysis has been carried out in terms of precision, accuracy, recall, F-1 score, RMSE and MAP for various social media dataset
    • …
    corecore