15,279 research outputs found
Facial expression recognition with emotion-based feature fusion
© 2015 Asia-Pacific Signal and Information Processing Association. In this paper, we propose an emotion-based feature fusion method using the Discriminant-Analysis of Canonical Correlations (DCC) for facial expression recognition. There have been many image features or descriptors proposed for facial expression recognition. For the different features, they may be more accurate for the recognition of different expressions. In our proposed method, four effective descriptors for facial expression representation, namely Local Binary Pattern (LBP), Local Phase Quantization (LPQ), Weber Local Descriptor (WLD), and Pyramid of Histogram of Oriented Gradients (PHOG), are considered. Supervised Locality Preserving Projection (SLPP) is applied to the respective features for dimensionality reduction and manifold learning. Experiments show that descriptors are also sensitive to the conditions of images, such as race, lighting, pose, etc. Thus, an adaptive descriptor selection algorithm is proposed, which determines the best two features for each expression class on a given training set. These two features are fused, so as to achieve a higher recognition rate for each expression. In our experiments, the JAFFE and BAUM-2 databases are used, and experiment results show that the descriptor selection step increases the recognition rate up to 2%
Recommended from our members
Automatic affective dimension recognition from naturalistic facial expressions based on wavelet filtering and PLS regression
Automatic affective dimension recognition from facial expression continuously in naturalistic contexts is a very challenging research topic but very important in human-computer interaction. In this paper, an automatic recognition system was proposed to predict the affective dimensions such as Arousal, Valence and Dominance continuously in naturalistic facial expression videos. Firstly, visual and vocal features are extracted from image frames and audio segments in facial expression videos. Secondly, a wavelet transform based digital filtering method is applied to remove the irrelevant noise information in the feature space. Thirdly, Partial Least Squares regression is used to predict the affective dimensions from both video and audio modalities. Finally, two modalities are combined to boost overall performance in the decision fusion process. The proposed method is tested in the fourth international Audio/Visual Emotion Recognition Challenge (AVEC2014) dataset and compared to other state-of-the-art methods in the affect recognition sub-challenge with a good performance
Multi-Network Feature Fusion Facial Emotion Recognition using Nonparametric Method with Augmentation
Facial expression emotion identification and prediction is one of the most difficult problems in computer science. Pre-processing and feature extraction are crucial components of the more conventional methods. For the purpose of emotion identification and prediction using 2D facial expressions, this study targets the Face Expression Recognition dataset and shows the real implementation or assessment of learning algorithms such as various CNNs. Due to its vast potential in areas like artificial intelligence, emotion detection from facial expressions has become an essential requirement. Many efforts have been done on the subject since it is both a challenging and fascinating challenge in computer vision. The focus of this study is on using a convolutional neural network supplemented with data to build a facial emotion recognition system. This method may use face images to identify seven fundamental emotions, including anger, contempt, fear, happiness, neutrality, sadness, and surprise. As well as improving upon the validation accuracy of current models, a convolutional neural network that takes use of data augmentation, feature fusion, and the NCA feature selection approach may assist solve some of their drawbacks. Researchers in this area are focused on improving computer predictions by creating methods to read and codify facial expressions. With deep learning's striking success, many architectures within the framework are being used to further the method's efficacy. We highlight the contributions dealt with, the architecture and databases used, and demonstrate the development by contrasting the offered approaches and the outcomes produced. The purpose of this study is to aid and direct future researchers in the subject by reviewing relevant recent studies and offering suggestions on how to further the field. An innovative feature-based transfer learning technique is created using the pre-trained networks MobileNetV2 and DenseNet-201. The suggested system's recognition rate is 75.31%, which is a significant improvement over the results of the prior feature fusion study
Facial expression recognition via a jointly-learned dual-branch network
Human emotion recognition depends on facial expressions, and essentially on the extraction of relevant features. Accurate feature extraction is generally difficult due to the influence of external interference factors and the mislabelling of some datasets, such as the Fer2013 dataset. Deep learning approaches permit an automatic and intelligent feature extraction based on the input database. But, in the case of poor database distribution or insufficient diversity of database samples, extracted features will be negatively affected. Furthermore, one of the main challenges for efficient facial feature extraction and accurate facial expression recognition is the facial expression datasets, which are usually considerably small compared to other image datasets. To solve these problems, this paper proposes a new approach based on a dual-branch convolutional neural network for facial expression recognition, which is formed by three modules: The two first ones ensure features engineering stage by two branches, and features fusion and classification are performed by the third one. In the first branch, an improved convolutional part of the VGG network is used to benefit from its known robustness, the transfer learning technique with the EfficientNet network is applied in the second branch, to improve the quality of limited training samples in datasets. Finally, and in order to improve the recognition performance, a classification decision will be made based on the fusion of both branches’ feature maps. Based on the experimental results obtained on the Fer2013 and CK+ datasets, the proposed approach shows its superiority compared to several state-of-the-art results as well as using one model at a time. Those results are very competitive, especially for the CK+ dataset, for which the proposed dual branch model reaches an accuracy of 99.32, while for the FER-2013 dataset, the VGG-inspired CNN obtains an accuracy of 67.70, which is considered an acceptable accuracy, given the difficulty of the images of this dataset
Multimodal Affective State Recognition in Serious Games Applications
A challenging research issue, which has recently attracted a lot of attention, is the incorporation of emotion recognition technology in serious games applications, in order to improve the quality of interaction and enhance the gaming experience. To this end, in this paper, we present an emotion recognition methodology that utilizes information extracted from multimodal fusion analysis to identify the affective state of players during gameplay scenarios. More specifically, two monomodal classifiers have been designed for extracting affective state information based on facial expression and body motion analysis. For the combination of different modalities a deep model is proposed that is able to make a decision about player’s affective state, while also being robust in the absence of one information cue. In order to evaluate the performance of our methodology, a bimodal database was created using Microsoft’s Kinect sensor, containing feature vectors extracted from users' facial expressions and body gestures. The proposed method achieved higher recognition rate in comparison with mono-modal, as well as early-fusion algorithms. Our methodology outperforms all other classifiers, achieving an overall recognition rate of 98.3%
Recommended from our members
MGEED: A Multimodal Genuine Emotion and Expression Detection Database
Multimodal emotion recognition has attracted increasing interest from academia and industry in recent years, since it enables emotion detection using various modalities, such as facial expression images, speech and physiological signals. Although research in this field has grown rapidly, it is still challenging to create a multimodal database containing facial electrical information due to the difficulty in capturing natural and subtle facial expression signals, such as optomyography (OMG) signals. To this end, we present a newly developed Multimodal Genuine Emotion and Expression Detection (MGEED) database in this paper, which is the first publicly available database containing the facial OMG signals. MGEED consists of 17 subjects with over 150K facial images, 140K depth maps and different modalities of physiological signals including OMG, electroencephalography (EEG) and electrocardiography (ECG) signals. The emotions of the participants are evoked by video stimuli and the data are collected by a multimodal sensing system. With the collected data, an emotion recognition method is developed based on multimodal signal synchronisation, feature extraction, fusion and emotion prediction. The results show that superior performance can be achieved by fusing the visual, EEG and OMG features. The database can be obtained from https://github.com/YMPort/MGEED.Engineering and Physical Sciences Research Council (EPSRC) through the Project 4D Facial Sensing and Modelling under Grant EP/N025849/1
A Dual-Modality Emotion Recognition System of EEG and Facial Images and its Application in Educational Scene
With the development of computer science, people's interactions with computers or through computers have become more frequent. Some human-computer interactions or human-to-human interactions that are often seen in daily life: online chat, online banking services, facial recognition functions, etc. Only through text messaging, however, can the effect of information transfer be reduced to around 30% of the original. Communication becomes truly efficient when we can see one other's reactions and feel each other's emotions.
This issue is especially noticeable in the educational field. Offline teaching is a classic teaching style in which teachers may determine a student's present emotional state based on their expressions and alter teaching methods accordingly. With the advancement of computers and the impact of Covid-19, an increasing number of schools and educational institutions are exploring employing online or video-based instruction. In such circumstances, it is difficult for teachers to get feedback from students. Therefore, an emotion recognition method is proposed in this thesis that can be used for educational scenarios, which can help teachers quantify the emotional state of students in class and be used to guide teachers in exploring or adjusting teaching methods.
Text, physiological signals, gestures, facial photographs, and other data types are commonly used for emotion recognition. Data collection for facial images emotion recognition is particularly convenient and fast among them, although there is a problem that people may subjectively conceal true emotions, resulting in inaccurate recognition results. Emotion recognition based on EEG waves can compensate for this drawback. Taking into account the aforementioned issues, this thesis first employs the SVM-PCA to classify emotions in EEG data, then employs the deep-CNN to classify the emotions of the subject's facial images. Finally, the D-S evidence theory is used for fusing and analyzing the two classification results and obtains the final emotion recognition accuracy of 92%. The specific research content of this thesis is as follows:
1) The background of emotion recognition systems used in teaching scenarios is discussed, as well as the use of various single modality systems for emotion recognition.
2) Detailed analysis of EEG emotion recognition based on SVM. The theory of EEG signal generation, frequency band characteristics, and emotional dimensions is introduced. The EEG signal is first filtered and processed with artifact removal. The processed EEG signal is then used for feature extraction using wavelet transforms. It is finally fed into the proposed SVM-PCA for emotion recognition and the accuracy is 64%.
3) Using the proposed deep-CNN to recognize emotions in facial images. Firstly, the Adaboost algorithm is used to detect and intercept the face area in the image, and the gray level balance is performed on the captured image. Then the preprocessed images are trained and tested using the deep-CNN, and the average accuracy is 88%.
4) Fusion method based on decision-making layer. The data fusion at the decision level is carried out with the results of EEG emotion recognition and facial expression emotion recognition. The final dual-modality emotion recognition results and system accuracy of 92% are obtained using D-S evidence theory.
5) The dual-modality emotion recognition system's data collection approach is designed. Based on the process, the actual data in the educational scene is collected and analyzed. The final accuracy of the dual-modality system is 82%. Teachers can use the emotion recognition results as a guide and reference to improve their teaching efficacy
A method for facial expression recognition of image sequence based on spatial features
Facial expression recognition is the foundation of human emotion recognition, which has become a hot topic in the field of artificial intelligence in recent years. To some extent, facial expressions can be regarded as the process of facial muscle changes. Image sequence contains richer expression contents compared with a single image. So the expression recognition based on image sequence can yield more accurate results. A new method for facial expression recognition is proposed in this paper, which is based on spatial feature for image sequence. The method consists of four steps. Firstly, Siamese neural network is used to construct an evaluation model for changes of expression intensity, which extracts an appropriate image sequence from the video. Secondly, a convolutional neural network with attention mechanism is designed and trained, which is used to extract spatial features from each image in the sequence. Then, the spatial features of multiple images are fused. Finally, the fusion results are put into a convolutional neural network to recognize the facial expressions. This method is validated on CK+ dataset and the experimental results show that it’s more accurate than several other methods
- …