13,347 research outputs found

    Deep Learning Based Human Emotional State Recognition in a Video

    Get PDF
    Human emotions play significant role in everyday life. There are a lot of applications of automatic emotion recognition in medicine, e-learning, monitoring, marketing etc. In this paper the method and neural network architecture for real-time human emotion recognition by audio-visual data are proposed. To classify one of seven emotions, deep neural networks, namely, convolutional and recurrent neural networks are used. Visual information is represented by a sequence of 16 frames of 96 Ă— 96 pixels, and audio information - by 140 features for each of a sequence of 37 temporal windows. To reduce the number of audio features autoencoder was used. Audio information in conjunction with visual one is shown to increase recognition accuracy up to 12%. The developed system being not demanding to be computing resources is dynamic in terms of selection of parameters, reducing or increasing the number of emotion classes, as well as the ability to easily add, accumulate and use information from other external devices for further improvement of classification accuracy

    Deep learning framework for subject-independent emotion detection using wireless signals.

    Get PDF
    Emotion states recognition using wireless signals is an emerging area of research that has an impact on neuroscientific studies of human behaviour and well-being monitoring. Currently, standoff emotion detection is mostly reliant on the analysis of facial expressions and/or eye movements acquired from optical or video cameras. Meanwhile, although they have been widely accepted for recognizing human emotions from the multimodal data, machine learning approaches have been mostly restricted to subject dependent analyses which lack of generality. In this paper, we report an experimental study which collects heartbeat and breathing signals of 15 participants from radio frequency (RF) reflections off the body followed by novel noise filtering techniques. We propose a novel deep neural network (DNN) architecture based on the fusion of raw RF data and the processed RF signal for classifying and visualising various emotion states. The proposed model achieves high classification accuracy of 71.67% for independent subjects with 0.71, 0.72 and 0.71 precision, recall and F1-score values respectively. We have compared our results with those obtained from five different classical ML algorithms and it is established that deep learning offers a superior performance even with limited amount of raw RF and post processed time-sequence data. The deep learning model has also been validated by comparing our results with those from ECG signals. Our results indicate that using wireless signals for stand-by emotion state detection is a better alternative to other technologies with high accuracy and have much wider applications in future studies of behavioural sciences

    FusionSense: Emotion Classification using Feature Fusion of Multimodal Data and Deep learning in a Brain-inspired Spiking Neural Network

    Get PDF
    Using multimodal signals to solve the problem of emotion recognition is one of the emerging trends in affective computing. Several studies have utilized state of the art deep learning methods and combined physiological signals, such as the electrocardiogram (EEG), electroencephalogram (ECG), skin temperature, along with facial expressions, voice, posture to name a few, in order to classify emotions. Spiking neural networks (SNNs) represent the third generation of neural networks and employ biologically plausible models of neurons. SNNs have been shown to handle Spatio-temporal data, which is essentially the nature of the data encountered in emotion recognition problem, in an efficient manner. In this work, for the first time, we propose the application of SNNs in order to solve the emotion recognition problem with the multimodal dataset. Specifically, we use the NeuCube framework, which employs an evolving SNN architecture to classify emotional valence and evaluate the performance of our approach on the MAHNOB-HCI dataset. The multimodal data used in our work consists of facial expressions along with physiological signals such as ECG, skin temperature, skin conductance, respiration signal, mouth length, and pupil size. We perform classification under the Leave-One-Subject-Out (LOSO) cross-validation mode. Our results show that the proposed approach achieves an accuracy of 73.15% for classifying binary valence when applying feature-level fusion, which is comparable to other deep learning methods. We achieve this accuracy even without using EEG, which other deep learning methods have relied on to achieve this level of accuracy. In conclusion, we have demonstrated that the SNN can be successfully used for solving the emotion recognition problem with multimodal data and also provide directions for future research utilizing SNN for Affective computing. In addition to the good accuracy, the SNN recognition system is requires incrementally trainable on new data in an adaptive way. It only one pass training, which makes it suitable for practical and on-line applications. These features are not manifested in other methods for this problem.Peer reviewe

    Logging Stress and Anxiety Using a Gamified Mobile-based EMA Application, and Emotion Recognition Using a Personalized Machine Learning Approach

    Get PDF
    According to American Psychological Association (APA) more than 9 in 10 (94 percent) adults believe that stress can contribute to the development of major health problems, such as heart disease, depression, and obesity. Due to the subjective nature of stress, and anxiety, it has been demanding to measure these psychological issues accurately by only relying on objective means. In recent years, researchers have increasingly utilized computer vision techniques and machine learning algorithms to develop scalable and accessible solutions for remote mental health monitoring via web and mobile applications. To further enhance accuracy in the field of digital health and precision diagnostics, there is a need for personalized machine-learning approaches that focus on recognizing mental states based on individual characteristics, rather than relying solely on general-purpose solutions. This thesis focuses on conducting experiments aimed at recognizing and assessing levels of stress and anxiety in participants. In the initial phase of the study, a mobile application with broad applicability (compatible with both Android and iPhone platforms) is introduced (we called it STAND). This application serves the purpose of Ecological Momentary Assessment (EMA). Participants receive daily notifications through this smartphone-based app, which redirects them to a screen consisting of three components. These components include a question that prompts participants to indicate their current levels of stress and anxiety, a rating scale ranging from 1 to 10 for quantifying their response, and the ability to capture a selfie. The responses to the stress and anxiety questions, along with the corresponding selfie photographs, are then analyzed on an individual basis. This analysis focuses on exploring the relationships between self-reported stress and anxiety levels and potential facial expressions indicative of stress and anxiety, eye features such as pupil size variation and eye closure, and specific action units (AUs) observed in the frames over time. In addition to its primary functions, the mobile app also gathers sensor data, including accelerometer and gyroscope readings, on a daily basis. This data holds potential for further analysis related to stress and anxiety. Furthermore, apart from capturing selfie photographs, participants have the option to upload video recordings of themselves while engaging in two neuropsychological games. These recorded videos are then subjected to analysis in order to extract pertinent features that can be utilized for binary classification of stress and anxiety (i.e., stress and anxiety recognition). The participants that will be selected for this phase are students aged between 18 and 38, who have received recent clinical diagnoses indicating specific stress and anxiety levels. In order to enhance user engagement in the intervention, gamified elements - an emerging trend to influence user behavior and lifestyle - has been utilized. Incorporating gamified elements into non-game contexts (e.g., health-related) has gained overwhelming popularity during the last few years which has made the interventions more delightful, engaging, and motivating. In the subsequent phase of this research, we conducted an AI experiment employing a personalized machine learning approach to perform emotion recognition on an established dataset called Emognition. This experiment served as a simulation of the future analysis that will be conducted as part of a more comprehensive study focusing on stress and anxiety recognition. The outcomes of the emotion recognition experiment in this study highlight the effectiveness of personalized machine learning techniques and bear significance for the development of future diagnostic endeavors. For training purposes, we selected three models, namely KNN, Random Forest, and MLP. The preliminary performance accuracy results for the experiment were 93%, 95%, and 87% respectively for these models

    An original framework for understanding human actions and body language by using deep neural networks

    Get PDF
    The evolution of both fields of Computer Vision (CV) and Artificial Neural Networks (ANNs) has allowed the development of efficient automatic systems for the analysis of people's behaviour. By studying hand movements it is possible to recognize gestures, often used by people to communicate information in a non-verbal way. These gestures can also be used to control or interact with devices without physically touching them. In particular, sign language and semaphoric hand gestures are the two foremost areas of interest due to their importance in Human-Human Communication (HHC) and Human-Computer Interaction (HCI), respectively. While the processing of body movements play a key role in the action recognition and affective computing fields. The former is essential to understand how people act in an environment, while the latter tries to interpret people's emotions based on their poses and movements; both are essential tasks in many computer vision applications, including event recognition, and video surveillance. In this Ph.D. thesis, an original framework for understanding Actions and body language is presented. The framework is composed of three main modules: in the first one, a Long Short Term Memory Recurrent Neural Networks (LSTM-RNNs) based method for the Recognition of Sign Language and Semaphoric Hand Gestures is proposed; the second module presents a solution based on 2D skeleton and two-branch stacked LSTM-RNNs for action recognition in video sequences; finally, in the last module, a solution for basic non-acted emotion recognition by using 3D skeleton and Deep Neural Networks (DNNs) is provided. The performances of RNN-LSTMs are explored in depth, due to their ability to model the long term contextual information of temporal sequences, making them suitable for analysing body movements. All the modules were tested by using challenging datasets, well known in the state of the art, showing remarkable results compared to the current literature methods

    A Survey on Emotion Recognition for Human Robot Interaction

    Get PDF
    With the recent developments of technology and the advances in artificial intelligent and machine learning techniques, it becomes possible for the robot to acquire and show the emotions as a part of Human-Robot Interaction (HRI). An emotional robot can recognize the emotional states of humans so that it will be able to interact more naturally with its human counterpart in different environments. In this article, a survey on emotion recognition for HRI systems has been presented. The survey aims to achieve two objectives. Firstly, it aims to discuss the main challenges that face researchers when building emotional HRI systems. Secondly, it seeks to identify sensing channels that can be used to detect emotions and provides a literature review about recent researches published within each channel, along with the used methodologies and achieved results. Finally, some of the existing emotion recognition issues and recommendations for future works have been outlined

    2-D Attention Based Convolutional Recurrent Neural Network for Speech Emotion Recognition

    Get PDF
    Recognizing speech emotions  is a formidable challenge due to the complexity of emotions. The function of Speech Emotion Recognition(SER) is significantly impacted by the effects of emotional signals retrieved from speech. The majority of emotional traits, on the other hand, are sensitive to emotionally neutral elements like the speaker, speaking manner, and gender. In this work, we postulate that computing deltas  for individual features maintain useful information which is mainly relevant to emotional traits while it minimizes the loss of emotionally irrelevant components, thus leading to fewer misclassifications. Additionally, Speech Emotion Recognition(SER) commonly experiences silent and emotionally unrelated frames. The proposed technique is quite good at picking up important feature representations for emotion relevant features. So here is a two  dimensional convolutional recurrent neural network that is attention-based to learn distinguishing characteristics and predict the emotions. The Mel-spectrogram is used for feature extraction. The suggested technique is conducted on IEMOCAP dataset and it has better performance, with 68% accuracy value
    • …
    corecore