Search CORE

369 research outputs found

An original framework for understanding human actions and body language by using deep neural networks

Author: MASSARONI CRISTIANO
Publication venue
Publication date: 28/02/2020
Field of study

The evolution of both fields of Computer Vision (CV) and Artificial Neural Networks (ANNs) has allowed the development of efficient automatic systems for the analysis of people's behaviour. By studying hand movements it is possible to recognize gestures, often used by people to communicate information in a non-verbal way. These gestures can also be used to control or interact with devices without physically touching them. In particular, sign language and semaphoric hand gestures are the two foremost areas of interest due to their importance in Human-Human Communication (HHC) and Human-Computer Interaction (HCI), respectively. While the processing of body movements play a key role in the action recognition and affective computing fields. The former is essential to understand how people act in an environment, while the latter tries to interpret people's emotions based on their poses and movements; both are essential tasks in many computer vision applications, including event recognition, and video surveillance. In this Ph.D. thesis, an original framework for understanding Actions and body language is presented. The framework is composed of three main modules: in the first one, a Long Short Term Memory Recurrent Neural Networks (LSTM-RNNs) based method for the Recognition of Sign Language and Semaphoric Hand Gestures is proposed; the second module presents a solution based on 2D skeleton and two-branch stacked LSTM-RNNs for action recognition in video sequences; finally, in the last module, a solution for basic non-acted emotion recognition by using 3D skeleton and Deep Neural Networks (DNNs) is provided. The performances of RNN-LSTMs are explored in depth, due to their ability to model the long term contextual information of temporal sequences, making them suitable for analysing body movements. All the modules were tested by using challenging datasets, well known in the state of the art, showing remarkable results compared to the current literature methods

Archivio della ricerca- Università di Roma La Sapienza

Facial Expression Recognition Based on Deep Learning Convolution Neural Network: A Review

Author: Abdulazeez Adnan Mohsin
M. Saleem Abdullah Sharmeen
Publication venue: 'Penerbit UTHM'
Publication date: 15/04/2021
Field of study

Facial emotional processing is one of the most important activities in effective calculations, engagement with people and computers, machine vision, video game testing, and consumer research. Facial expressions are a form of nonverbal communication, as they reveal a person's inner feelings and emotions. Extensive attention to Facial Expression Recognition (FER) has recently been received as facial expressions are considered. As the fastest communication medium of any kind of information. Facial expression recognition gives a better understanding of a person's thoughts or views and analyzes them with the currently trending deep learning methods. Accuracy rate sharply compared to traditional state-of-the-art systems. This article provides a brief overview of the different FER fields of application and publicly accessible databases used in FER and studies the latest and current reviews in FER using Convolution Neural Network (CNN) algorithms. Finally, it is observed that everyone reached good results, especially in terms of accuracy, with different rates, and using different data sets, which impacts the results

Journals of Universiti Tun Hussein Onn Malaysia (UTHM)

Facial Expression Recognition in the Wild Using Convolutional Neural Networks

Author: Farzaneh Amir Hossein
Publication venue: DigitalCommons@USU
Publication date: 01/08/2020
Field of study

Facial Expression Recognition (FER) is the task of predicting a specific facial expression given a facial image. FER has demonstrated remarkable progress due to the advancement of deep learning. Generally, a FER system as a prediction model is built using two sub-modules: 1. Facial image representation model that learns a mapping from the input 2D facial image to a compact feature representation in the embedding space, and 2. A classifier module that maps the learned features to the label space comprising seven labels of neutral, happy, sad, surprise, anger, fear, or disgust. Ultimately, the prediction model aims to predict one of the seven aforementioned labels for the given input image. This process is carried out using a supervised learning algorithm where the model minimizes an objective function that measures the error between the prediction and true label by searching for the best mapping function. Our work is inspired by Deep Metric Learning (DML) approaches to learn an efficient embedding space for the classifier module. DML fundamentally aims to achieve maximal separation in the embedding space by creating compact and well-separated clusters with the capability of feature discrimination. However, conventional DML methods ignore the underlying challenges associated with wild FER datasets, where images exhibit large intra-class variation and inter-class similarity. First, we tackle the extreme class imbalance that leads to a separation bias toward facial expression classes populated with more data (e.g., happy and neutral) against minority classes (e.g., disgust and fear). To eliminate this bias, we propose a discriminant objective function to optimize the embedding space to enforce inter-class separation of features for both majority and minority classes. Second, we design an adaptive mechanism to selectively discriminate features in the embedding space to promote generalization to yield a prediction model that classifies unseen images more accurately. We are inspired by the human visual attention model described as the perception of the most salient visual cues in the observed scene. Accordingly, our attentive mechanism adaptively selects important features to discriminate in the DML\u27s objective function. We conduct experiments on two popular large-scale wild FER datasets (RAF-DB and AffectNet) to show the enhanced discriminative power of our proposed methods compared with several state-of-the-art FER methods

DigitalCommons@USU

Recommended from our members

Customized 2D CNN Model for the Automatic Emotion Recognition Based on EEG Signals

Author: Baradaran F
Danishvar S
Farzan A
Sheykhivand S
Publication venue: 'MDPI AG'
Publication date: 14/05/2023
Field of study

Data Availability Statement: The data related to this article is publicly available on the GitHub platform under the title Baradaran emotion dataset.Copyright © 2023 by the authors. Automatic emotion recognition from electroencephalogram (EEG) signals can be considered as the main component of brain–computer interface (BCI) systems. In the previous years, many researchers in this direction have presented various algorithms for the automatic classification of emotions from EEG signals, and they have achieved promising results; however, lack of stability, high error, and low accuracy are still considered as the central gaps in this research. For this purpose, obtaining a model with the precondition of stability, high accuracy, and low error is considered essential for the automatic classification of emotions. In this research, a model based on Deep Convolutional Neural Networks (DCNNs) is presented, which can classify three positive, negative, and neutral emotions from EEG signals based on musical stimuli with high reliability. For this purpose, a comprehensive database of EEG signals has been collected while volunteers were listening to positive and negative music in order to stimulate the emotional state. The architecture of the proposed model consists of a combination of six convolutional layers and two fully connected layers. In this research, different feature learning and hand-crafted feature selection/extraction algorithms were investigated and compared with each other in order to classify emotions. The proposed model for the classification of two classes (positive and negative) and three classes (positive, neutral, and negative) of emotions had 98% and 96% accuracy, respectively, which is very promising compared with the results of previous research. In order to evaluate more fully, the proposed model was also investigated in noisy environments; with a wide range of different SNRs, the classification accuracy was still greater than 90%. Due to the high performance of the proposed model, it can be used in brain–computer user environments.This research received no external funding

Brunel University Research Archive

Customer’s spontaneous facial expression recognition

Author: Golam Morshed
Hamimah Binti Ujir
Irwandi Hipni Mohamad Hipiny
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 13/06/2021
Field of study

In the field of consumer science, customer facial expression is often categorized either as negative or positive. Customer who portrays negative emotion to a specific product mostly means they reject the product while a customer with positive emotion is more likely to purchase the product. To observe customer emotion, many researchers have studied different perspectives and methodologies to obtain high accuracy results. Conventional neural network (CNN) is used to recognize customer spontaneous facial expressions. This paper aims to recognize customer spontaneous expressions while the customer observed certain products. We have developed a customer service system using a CNN that is trained to detect three types of facial expression, i.e. happy, sad, and neutral. Facial features are extracted together with its histogram of gradient and sliding window. The results are then compared with the existing works and it shows an achievement of 82.9% success rate on average

Unimas Institutional Repository

Customer’s Spontaneous Facial Expression Recognition

Author: Golam Morshed
Hamimah Ujir
Irwandi Hipiny
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 13/06/2021
Field of study

Unimas Institutional Repository

A Compact CNN-Based Speech Enhancement With Adaptive Filter Design Using Gabor Function And Region-Aware Convolution

Author: Abdullah Salinna
Demosthenous Andreas
Zamani Majid
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

Speech enhancement (SE) is used in many applications, such as hearing devices, to improve speech intelligibility and quality. Convolutional neural network-based (CNN-based) SE algorithms in literature often employ generic convolutional filters that are not optimized for SE applications. This paper presents a CNN-based SE algorithm with an adaptive filter design (named ‘CNN-AFD’) using Gabor function and region-aware convolution. The proposed algorithm incorporates fixed Gabor functions into convolutional filters to model human auditory processing for improved denoising performance. The feature maps obtained from the Gabor-incorporated convolutional layers serve as learnable guided masks (tuned at backpropagation) for generating adaptive custom region-aware filters. The custom filters extract features from speech regions (i.e., ‘region-aware’) while maintaining translation-invariance. To reduce the high cost of inference of the CNN, skip convolution and activation analysis-wise pruning are explored. Employing skip convolution allowed the training time per epoch to be reduced by close to 40%. Pruning of neurons with high numbers of zero activations complements skip convolution and significantly reduces model parameters by more than 30%. The proposed CNN-AFD outperformed all four CNN-based SE baseline algorithms (i.e., a CNN-based SE employing generic filters, a CNN-based SE without region-aware convolution, a CNN-based SE trained with complex spectrograms and a CNN-based SE processing in the time-domain) with an average of 0.95, 1.82 and 0.82 in short-time objective intelligibility (STOI), perceptual evaluation of speech quality (PESQ) and logarithmic spectral distance (LSD) scores, respectively, when tasked to denoise speech contaminated with NOISEX-92 noises at −5, 0 and 5 dB signal-to-noise ratios (SNRs)

UCL Discovery