20,790 research outputs found

    FusionSense: Emotion Classification using Feature Fusion of Multimodal Data and Deep learning in a Brain-inspired Spiking Neural Network

    Get PDF
    Using multimodal signals to solve the problem of emotion recognition is one of the emerging trends in affective computing. Several studies have utilized state of the art deep learning methods and combined physiological signals, such as the electrocardiogram (EEG), electroencephalogram (ECG), skin temperature, along with facial expressions, voice, posture to name a few, in order to classify emotions. Spiking neural networks (SNNs) represent the third generation of neural networks and employ biologically plausible models of neurons. SNNs have been shown to handle Spatio-temporal data, which is essentially the nature of the data encountered in emotion recognition problem, in an efficient manner. In this work, for the first time, we propose the application of SNNs in order to solve the emotion recognition problem with the multimodal dataset. Specifically, we use the NeuCube framework, which employs an evolving SNN architecture to classify emotional valence and evaluate the performance of our approach on the MAHNOB-HCI dataset. The multimodal data used in our work consists of facial expressions along with physiological signals such as ECG, skin temperature, skin conductance, respiration signal, mouth length, and pupil size. We perform classification under the Leave-One-Subject-Out (LOSO) cross-validation mode. Our results show that the proposed approach achieves an accuracy of 73.15% for classifying binary valence when applying feature-level fusion, which is comparable to other deep learning methods. We achieve this accuracy even without using EEG, which other deep learning methods have relied on to achieve this level of accuracy. In conclusion, we have demonstrated that the SNN can be successfully used for solving the emotion recognition problem with multimodal data and also provide directions for future research utilizing SNN for Affective computing. In addition to the good accuracy, the SNN recognition system is requires incrementally trainable on new data in an adaptive way. It only one pass training, which makes it suitable for practical and on-line applications. These features are not manifested in other methods for this problem.Peer reviewe

    Multimodal Sensing and Data Processing for Speaker and Emotion Recognition using Deep Learning Models with Audio, Video and Biomedical Sensors

    Full text link
    The focus of the thesis is on Deep Learning methods and their applications on multimodal data, with a potential to explore the associations between modalities and replace missing and corrupt ones if necessary. We have chosen two important real-world applications that need to deal with multimodal data: 1) Speaker recognition and identification; 2) Facial expression recognition and emotion detection. The first part of our work assesses the effectiveness of speech-related sensory data modalities and their combinations in speaker recognition using deep learning models. First, the role of electromyography (EMG) is highlighted as a unique biometric sensor in improving audio-visual speaker recognition or as a substitute in noisy or poorly-lit environments. Secondly, the effectiveness of deep learning is empirically confirmed through its higher robustness to all types of features in comparison to a number of commonly used baseline classifiers. Not only do deep models outperform the baseline methods, their power increases when they integrate multiple modalities, as different modalities contain information on different aspects of the data, especially between EMG and audio. Interestingly, our deep learning approach is word-independent. Plus, the EMG, audio, and visual parts of the samples from each speaker do not need to match. This increases the flexibility of our method in using multimodal data, particularly if one or more modalities are missing. With a dataset of 23 individuals speaking 22 words five times, we show that EMG can replace the audio/visual modalities, and when combined, significantly improve the accuracy of speaker recognition. The second part describes a study on automated emotion recognition using four different modalities – audio, video, electromyography (EMG), and electroencephalography (EEG). We collected a dataset by recording the 4 modalities as 12 human subjects expressed six different emotions or maintained a neutral expression. Three different aspects of emotion recognition were investigated: model selection, feature selection, and data selection. Both generative models (DBNs) and discriminative models (LSTMs) were applied to the four modalities, and from these analyses we conclude that LSTM is better for audio and video together with their corresponding sophisticated feature extractors (MFCC and CNN), whereas DBN is better for both EMG and EEG. By examining these signals at different stages (pre-speech, during-speech, and post-speech) of the current and following trials, we have found that the most effective stages for emotion recognition from EEG occur after the emotion has been expressed, suggesting that the neural signals conveying an emotion are long-lasting

    Exploring the dynamic interplay of cognitive load and emotional arousal by using multimodal measurements: Correlation of pupil diameter and emotional arousal in emotionally engaging tasks

    Full text link
    Multimodal data analysis and validation based on streams from state-of-the-art sensor technology such as eye-tracking or emotion recognition using the Facial Action Coding System (FACTs) with deep learning allows educational researchers to study multifaceted learning and problem-solving processes and to improve educational experiences. This study aims to investigate the correlation between two continuous sensor streams, pupil diameter as an indicator of cognitive workload and FACTs with deep learning as an indicator of emotional arousal (RQ 1a), specifically for epochs of high, medium, and low arousal (RQ 1b). Furthermore, the time lag between emotional arousal and pupil diameter data will be analyzed (RQ 2). 28 participants worked on three cognitively demanding and emotionally engaging everyday moral dilemmas while eye-tracking and emotion recognition data were collected. The data were pre-processed in Phyton (synchronization, blink control, downsampling) and analyzed using correlation analysis and Granger causality tests. The results show negative and statistically significant correlations between the data streams for emotional arousal and pupil diameter. However, the correlation is negative and significant only for epochs of high arousal, while positive but non-significant relationships were found for epochs of medium or low arousal. The average time lag for the relationship between arousal and pupil diameter was 2.8 ms. In contrast to previous findings without a multimodal approach suggesting a positive correlation between the constructs, the results contribute to the state of research by highlighting the importance of multimodal data validation and research on convergent vagility. Future research should consider emotional regulation strategies and emotional valence.Comment: The first two authors contributed equally to the manuscrip

    Emotion and Stress Recognition Related Sensors and Machine Learning Technologies

    Get PDF
    This book includes impactful chapters which present scientific concepts, frameworks, architectures and ideas on sensing technologies and machine learning techniques. These are relevant in tackling the following challenges: (i) the field readiness and use of intrusive sensor systems and devices for capturing biosignals, including EEG sensor systems, ECG sensor systems and electrodermal activity sensor systems; (ii) the quality assessment and management of sensor data; (iii) data preprocessing, noise filtering and calibration concepts for biosignals; (iv) the field readiness and use of nonintrusive sensor technologies, including visual sensors, acoustic sensors, vibration sensors and piezoelectric sensors; (v) emotion recognition using mobile phones and smartwatches; (vi) body area sensor networks for emotion and stress studies; (vii) the use of experimental datasets in emotion recognition, including dataset generation principles and concepts, quality insurance and emotion elicitation material and concepts; (viii) machine learning techniques for robust emotion recognition, including graphical models, neural network methods, deep learning methods, statistical learning and multivariate empirical mode decomposition; (ix) subject-independent emotion and stress recognition concepts and systems, including facial expression-based systems, speech-based systems, EEG-based systems, ECG-based systems, electrodermal activity-based systems, multimodal recognition systems and sensor fusion concepts and (x) emotion and stress estimation and forecasting from a nonlinear dynamical system perspective
    • …
    corecore