86 research outputs found
Emotion Recognition from EEG Signal Focusing on Deep Learning and Shallow Learning Techniques
Recently, electroencephalogram-based emotion recognition has become crucial in enabling the Human-Computer Interaction (HCI) system to become more intelligent. Due to the outstanding applications of emotion recognition, e.g., person-based decision making, mind-machine interfacing, cognitive interaction, affect detection, feeling detection, etc., emotion recognition has become successful in attracting the recent hype of AI-empowered research. Therefore, numerous studies have been conducted driven by a range of approaches, which demand a systematic review of methodologies used for this task with their feature sets and techniques. It will facilitate the beginners as guidance towards composing an effective emotion recognition system. In this article, we have conducted a rigorous review on the state-of-the-art emotion recognition systems, published in recent literature, and summarized some of the common emotion recognition steps with relevant definitions, theories, and analyses to provide key knowledge to develop a proper framework. Moreover, studies included here were dichotomized based on two categories: i) deep learning-based, and ii) shallow machine learning-based emotion recognition systems. The reviewed systems were compared based on methods, classifier, the number of classified emotions, accuracy, and dataset used. An informative comparison, recent research trends, and some recommendations are also provided for future research directions
Deep learning framework for subject-independent emotion detection using wireless signals.
Emotion states recognition using wireless signals is an emerging area of research that has an impact on neuroscientific studies of human behaviour and well-being monitoring. Currently, standoff emotion detection is mostly reliant on the analysis of facial expressions and/or eye movements acquired from optical or video cameras. Meanwhile, although they have been widely accepted for recognizing human emotions from the multimodal data, machine learning approaches have been mostly restricted to subject dependent analyses which lack of generality. In this paper, we report an experimental study which collects heartbeat and breathing signals of 15 participants from radio frequency (RF) reflections off the body followed by novel noise filtering techniques. We propose a novel deep neural network (DNN) architecture based on the fusion of raw RF data and the processed RF signal for classifying and visualising various emotion states. The proposed model achieves high classification accuracy of 71.67% for independent subjects with 0.71, 0.72 and 0.71 precision, recall and F1-score values respectively. We have compared our results with those obtained from five different classical ML algorithms and it is established that deep learning offers a superior performance even with limited amount of raw RF and post processed time-sequence data. The deep learning model has also been validated by comparing our results with those from ECG signals. Our results indicate that using wireless signals for stand-by emotion state detection is a better alternative to other technologies with high accuracy and have much wider applications in future studies of behavioural sciences
An ongoing review of speech emotion recognition
User emotional status recognition is becoming a key feature in advanced Human Computer Interfaces (HCI). A key source of emotional information is the spoken expression, which may be part of the interaction between the human and the machine. Speech emotion recognition (SER) is a very active area of research that involves the application of current machine learning and neural networks tools. This ongoing review covers recent and classical approaches to SER reported in the literature.This work has been carried out with the support of project PID2020-116346GB-I00 funded by the Spanish MICIN
Multimodal Sensing and Data Processing for Speaker and Emotion Recognition using Deep Learning Models with Audio, Video and Biomedical Sensors
The focus of the thesis is on Deep Learning methods and their applications on multimodal data, with a potential to explore the associations between modalities and replace missing and corrupt ones if necessary. We have chosen two important real-world applications that need to deal with multimodal data: 1) Speaker recognition and identification; 2) Facial expression recognition and emotion detection.
The first part of our work assesses the effectiveness of speech-related sensory data modalities and their combinations in speaker recognition using deep learning models. First, the role of electromyography (EMG) is highlighted as a unique biometric sensor in improving audio-visual speaker recognition or as a substitute in noisy or poorly-lit environments. Secondly, the effectiveness of deep learning is empirically confirmed through its higher robustness to all types of features in comparison to a number of commonly used baseline classifiers. Not only do deep models outperform the baseline methods, their power increases when they integrate multiple modalities, as different modalities contain information on different aspects of the data, especially between EMG and audio. Interestingly, our deep learning approach is word-independent. Plus, the EMG, audio, and visual parts of the samples from each speaker do not need to match. This increases the flexibility of our method in using multimodal data, particularly if one or more modalities are missing. With a dataset of 23 individuals speaking 22 words five times, we show that EMG can replace the audio/visual modalities, and when combined, significantly improve the accuracy of speaker recognition.
The second part describes a study on automated emotion recognition using four different modalities – audio, video, electromyography (EMG), and electroencephalography (EEG). We collected a dataset by recording the 4 modalities as 12 human subjects expressed six different emotions or maintained a neutral expression. Three different aspects of emotion recognition were investigated: model selection, feature selection, and data selection. Both generative models (DBNs) and discriminative models (LSTMs) were applied to the four modalities, and from these analyses we conclude that LSTM is better for audio and video together with their corresponding sophisticated feature extractors (MFCC and CNN), whereas DBN is better for both EMG and EEG. By examining these signals at different stages (pre-speech, during-speech, and post-speech) of the current and following trials, we have found that the most effective stages for emotion recognition from EEG occur after the emotion has been expressed, suggesting that the neural signals conveying an emotion are long-lasting
Cross-Subject Emotion Recognition with Sparsely-Labeled Peripheral Physiological Data Using SHAP-Explained Tree Ensembles
There are still many challenges of emotion recognition using physiological
data despite the substantial progress made recently. In this paper, we
attempted to address two major challenges. First, in order to deal with the
sparsely-labeled physiological data, we first decomposed the raw physiological
data using signal spectrum analysis, based on which we extracted both
complexity and energy features. Such a procedure helped reduce noise and
improve feature extraction effectiveness. Second, in order to improve the
explainability of the machine learning models in emotion recognition with
physiological data, we proposed Light Gradient Boosting Machine (LightGBM) and
SHapley Additive exPlanations (SHAP) for emotion prediction and model
explanation, respectively. The LightGBM model outperformed the eXtreme Gradient
Boosting (XGBoost) model on the public Database for Emotion Analysis using
Physiological signals (DEAP) with f1-scores of 0.814, 0.823, and 0.860 for
binary classification of valence, arousal, and liking, respectively, with
cross-subject validation using eight peripheral physiological signals.
Furthermore, the SHAP model was able to identify the most important features in
emotion recognition, and revealed the relationships between the predictor
variables and the response variables in terms of their main effects and
interaction effects. Therefore, the results of the proposed model not only had
good performance using peripheral physiological data, but also gave more
insights into the underlying mechanisms in recognizing emotions
Physiological-based Driver Monitoring Systems: A Scoping Review
A physiological-based driver monitoring system (DMS) has attracted research interest and has great potential for providing more accurate and reliable monitoring of the driver’s state during a driving experience. Many driving monitoring systems are driver behavior-based or vehicle-based. When these non-physiological based DMS are coupled with physiological-based data analysis from electroencephalography (EEG), electrooculography (EOG), electrocardiography (ECG), and electromyography (EMG), the physical and emotional state of the driver may also be assessed. Drivers’ wellness can also be monitored, and hence, traffic collisions can be avoided. This paper highlights work that has been published in the past five years related to physiological-based DMS. Specifically, we focused on the physiological indicators applied in DMS design and development. Work utilizing key physiological indicators related to driver identification, driver alertness, driver drowsiness, driver fatigue, and drunk driver is identified and described based on the PRISMA Extension for Scoping Reviews (PRISMA-Sc) Framework. The relationship between selected papers is visualized using keyword co-occurrence. Findings were presented using a narrative review approach based on classifications of DMS. Finally, the challenges of physiological-based DMS are highlighted in the conclusion. Doi: 10.28991/CEJ-2022-08-12-020 Full Text: PD
Examining the Size of the Latent Space of Convolutional Variational Autoencoders Trained With Spectral Topographic Maps of EEG Frequency Bands
Electroencephalography (EEG) is a technique of recording brain electrical potentials using electrodes placed on the scalp [1]. It is well known that EEG signals contain essential information in the frequency, temporal and spatial domains. For example, some studies have converted EEG signals into topographic power head maps to preserve spatial information [2]. Others have produced spectral topographic head maps of different EEG bands to both preserve information in The associate editor coordinating the review of this manuscript and approving it for publication was Ludovico Minati . the spatial domain and take advantage of the information in the frequency domain [3]. However, topographic maps contain highly interpolated data in between electrode locations and are often redundant. For this reason, convolutional neural networks are often used to reduce their dimensionality and learn relevant features automatically [4]
A Review of Emotion Recognition Methods from Keystroke, Mouse, and Touchscreen Dynamics
Emotion can be defined as a subject’s organismic response to an external or internal stimulus event. The responses could be reflected in pattern changes of the subject’s facial expression, gesture, gait, eye-movement, physiological signals, speech and voice, keystroke, and mouse dynamics, etc. This suggests that on the one hand emotions can be measured/recognized from the responses, and on the other hand they can be facilitated/regulated by external stimulus events, situation changes or internal motivation changes. It is well-known that emotion has a close relationship with both physical and mental health, usually affecting an individual’s and a team’s work performance, thus emotion recognition is an important prerequisite for emotion regulation towards better emotional states and work performance. The primary problem in emotion recognition is how to recognize a subject’s emotional states easily and accurately. Currently, there are a body of good research on emotion recognition from facial expression, gesture, gait, eye-tracking, and other physiological signals such as speech and voice, but they are all intrusive and obtrusive to some extent. In contrast, keystroke, mouse and touchscreen (KMT) dynamics data can be collected non-intrusively and unobtrusively as secondary data responding to primary physical actions, thus, this paper aims to review the state-of-the-art research on emotion recognition from KMT dynamics and to identify key research challenges, opportunities and a future research roadmap for referencing. In addition, this paper answers the following six research questions (RQs): (1) what are the commonly used emotion elicitation methods and databases for emotion recognition? (2) which emotions could be recognized from KMT dynamics? (3) what key features are most appropriate for recognizing different specific emotions? (4) which classification methods are most effective for specific emotions? (5) what are the application trends of emotion recognition from KMT dynamics? (6) which application contexts are of greatest concern
Recommended from our members
Brainwave-Based Human Emotion Estimation using Deep Neural Network Models for Biofeedback
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonEmotion is a state that comprehensively represents human feeling, thought and behavior, thus takes an important role in interpersonal human communication. Emotion estimation aims to automatically discriminate different emotional states by using physiological and nonphysiological signals acquired from human to achieve effective communication and interaction between human and machines. Brainwaves-Based Emotion Estimation is one of the most common used and efficient methods for emotion estimation research. The technology reveals a great role for human emotional disorder treatment, brain computer interface for disabilities, entertainment and many other research areas. In this thesis, various methods, schemes and frameworks are presented for Electroencephalogram (EEG) based human emotion estimation. Firstly, a hybrid dimension featurere duction scheme is presented using a total of 14 different features extracted from EEG recordings. The scheme combines these distinct features in the feature space using both supervised and unsupervised feature selection processes. Maximum Relevance Minimum Redundancy (mRMR) is applied to re-order the combined features into max-relevance with the emotion labels and min-redundancy of each feature. The generated features are further reduced with Principal Component Analysis (PCA) for extracting the principal components. Experimental results show that the proposed work outperforms the state-of-art methods using the same settings at the publicly available Database for Emotional Analysis using Physiological Signals (DEAP) data set. Secondly, a disentangled adaptive noise learning β-Variational autoencoder (VAE) combinewithlongshorttermmemory(LSTM)modelwasproposedfortheemotionrecognition based on EEG recordings. The experiment is also based on the EEG emotion public DEAPdataset. At first, the EEG time-series data are transformed into the Video-like EEG image data through the Azimuthal Equidistant Projection (AEP) to original EEG-sensor 3-D coordinates to perform 2-D projected locations of electrodes. Then Clough-Tocher scheme is applied for interpolating the scattered power measurements over the scalp and for estimating the values in-between the electrodes over a 32x32 mesh. After that, the βVAE LSTM algorithm is used to estimate the accuracy of the quadratic (arousal-valence) classification. The comparison between the β VAE-LSTM model and other classic methods is conducted at the same experimental setting that shows that the proposed model is effective. Finally, a novel real-time emotion detection system based on the EEG signals from a portable headband was presented, integrated into the interactive film ‘RIOT’. At first, the requirement of the interactive film was collected and the protocol for data collection using a portable EEG sensor (Emotiv Epoc) was designed. Then, a portable EEG emotion database (PEED) is built from 10 participants with the emotion labels using both self-reporting and video annotation tools. After that, various feature extraction, feature selection, validation scheme and classification methods are explored to build a practical system for the real-time detection. In the end, the emotion detection system is trained and integrated into the interactive film for real-time implementation and fully evaluated. The experimental results demonstrate the system with satisfied emotion detection accuracy and real-time performance
Vision-based Driver State Monitoring Using Deep Learning
Road accidents cause thousands of injuries and losses of lives every year, ranking among the top lifetime odds of death causes. More than 90% of the traffic accidents are caused by human errors [1], including sight obstruction, failure to spot danger through inattention, speeding, expectation errors, and other reasons. In recent years, driver monitoring systems (DMS) have been rapidly studied and developed to be used in commercial vehicles to prevent human error-caused car crashes. A DMS is a vehicle safety system that monitors driver’s attention and warns if necessary. Such a system may contain multiple modules that detect the most accident-related human factors, such as drowsiness and distractions. Typical DMS approaches seek driver distraction cues either from vehicle acceleration and steering (vehicle-based approach), driver physiological signals (physiological approach), or driver behaviours (behavioural-based approach). Behavioural-based driver state monitoring has numerous advantages over vehicle-based and physiological-based counterparts, including fast responsiveness and non-intrusiveness. In addition, the recent breakthrough in deep learning enables high-level action and face recognition, expanding driver monitoring coverage and improving model performance. This thesis presents CareDMS, a behavioural approach-based driver monitoring system using deep learning methods. CareDMS consists of driver anomaly detection and classification, gaze estimation, and emotion recognition. Each approach is developed with state-of-the-art deep learning solutions to address the shortcomings of the current DMS functionalities. Combined with a classic drowsiness detection method, CareDMS thoroughly covers three major types of distractions: physical (hands-off-steering wheel), visual (eyes-off-road ahead), and cognitive (minds-off-driving).
There are numerous challenges in behavioural-based driver state monitoring. Current driver distraction detection methods either lack detailed distraction classification or unknown driver anomalies generalization. This thesis introduces a novel two-phase proposal and classification network architecture. It can suspect all forms of distracted driving and recognize driver actions simultaneously, which provide downstream DMS important information for warning level customization. Next, gaze estimation for driver monitoring is difficult as drivers tend to have severe head movements while driving. This thesis proposes a video-based neural network that jointly learns head pose and gaze dynamics together. The design significantly reduces per-head-pose gaze estimation performance variance compared to benchmarks. Furthermore, emotional driving such as road rage and sadness could seriously impact driving performance. However, individuals have various emotional expressions, which makes vision-based emotion recognition a challenging task. This work proposes an efficient and versatile multimodal fusion module that effectively fuses facial expression and human voice for emotion recognition. Visible advantages are demonstrated compared to using a single modality. Finally, a driver state monitoring system, CareDMS, is presented to convert the output of each functionality into a specific driver’s status measurement and integrates various measurements into the driver’s level of alertness
- …