737 research outputs found
Machine Understanding of Human Behavior
A widely accepted prediction is that computing will move to the background, weaving itself into the fabric of our everyday living spaces and projecting the human user into the foreground. If this prediction is to come true, then next generation computing, which we will call human computing, should be about anticipatory user interfaces that should be human-centered, built for humans based on human models. They should transcend the traditional keyboard and mouse to include natural, human-like interactive functions including understanding and emulating certain human behaviors such as affective and social signaling. This article discusses a number of components of human behavior, how they might be integrated into computers, and how far we are from realizing the front end of human computing, that is, how far are we from enabling computers to understand human behavior
Deep Adaptation of Adult-Child Facial Expressions by Fusing Landmark Features
Imaging of facial affects may be used to measure psychophysiological
attributes of children through their adulthood, especially for monitoring
lifelong conditions like Autism Spectrum Disorder. Deep convolutional neural
networks have shown promising results in classifying facial expressions of
adults. However, classifier models trained with adult benchmark data are
unsuitable for learning child expressions due to discrepancies in
psychophysical development. Similarly, models trained with child data perform
poorly in adult expression classification. We propose domain adaptation to
concurrently align distributions of adult and child expressions in a shared
latent space to ensure robust classification of either domain. Furthermore, age
variations in facial images are studied in age-invariant face recognition yet
remain unleveraged in adult-child expression classification. We take
inspiration from multiple fields and propose deep adaptive FACial Expressions
fusing BEtaMix SElected Landmark Features (FACE-BE-SELF) for adult-child facial
expression classification. For the first time in the literature, a mixture of
Beta distributions is used to decompose and select facial features based on
correlations with expression, domain, and identity factors. We evaluate
FACE-BE-SELF on two pairs of adult-child data sets. Our proposed FACE-BE-SELF
approach outperforms adult-child transfer learning and other baseline domain
adaptation methods in aligning latent representations of adult and child
expressions
FusionSense: Emotion Classification using Feature Fusion of Multimodal Data and Deep learning in a Brain-inspired Spiking Neural Network
Using multimodal signals to solve the problem of emotion recognition is one of the emerging trends in affective computing. Several studies have utilized state of the art deep learning methods and combined physiological signals, such as the electrocardiogram (EEG), electroencephalogram (ECG), skin temperature, along with facial expressions, voice, posture to name a few, in order to classify emotions. Spiking neural networks (SNNs) represent the third generation of neural networks and employ biologically plausible models of neurons. SNNs have been shown to handle Spatio-temporal data, which is essentially the nature of the data encountered in emotion recognition problem, in an efficient manner. In this work, for the first time, we propose the application of SNNs in order to solve the emotion recognition problem with the multimodal dataset. Specifically, we use the NeuCube framework, which employs an evolving SNN architecture to classify emotional valence and evaluate the performance of our approach on the MAHNOB-HCI dataset. The multimodal data used in our work consists of facial expressions along with physiological signals such as ECG, skin temperature, skin conductance, respiration signal, mouth length, and pupil size. We perform classification under the Leave-One-Subject-Out (LOSO) cross-validation mode. Our results show that the proposed approach achieves an accuracy of 73.15% for classifying binary valence when applying feature-level fusion, which is comparable to other deep learning methods. We achieve this accuracy even without using EEG, which other deep learning methods have relied on to achieve this level of accuracy. In conclusion, we have demonstrated that the SNN can be successfully used for solving the emotion recognition problem with multimodal data and also provide directions for future research utilizing SNN for Affective computing. In addition to the good accuracy, the SNN recognition system is requires incrementally trainable on new data in an adaptive way. It only one pass training, which makes it suitable for practical and on-line applications. These features are not manifested in other methods for this problem.Peer reviewe
Recommended from our members
Automatic Replication of Teleoperator Head Movements and Facial Expressions on a Humanoid Robot
Robotic telepresence aims to create a physical presence for a remotely located human (teleoperator) by reproducing their verbal and nonverbal behaviours (e.g. speech, gestures, facial expressions) on a robotic platform. In this work, we propose a novel teleoperation system that combines the replication of facial expressions of emotions (neutral, disgust, happiness, and surprise) and head movements on the fly on the humanoid robot Nao. Robots' expression of emotions is constrained by their physical and behavioural capabilities. As the Nao robot has a static face, we use the LEDs located around its eyes to reproduce the teleoperator expressions of emotions. Using a web camera, we computationally detect the facial action units and measure the head pose of the operator. The emotion to be replicated is inferred from the detected action units by a neural network. Simultaneously, the measured head motion is smoothed and bounded to the robot's physical limits by applying a constrained-state Kalman filter. In order to evaluate the proposed system, we conducted a user study by asking 28 participants to use the replication system by displaying facial expressions and head movements while being recorded by a web camera. Subsequently, 18 external observers viewed the recorded clips via an online survey and assessed the quality of the robot's replication of the participants' behaviours. Our results show that the proposed teleoperation system can successfully communicate emotions and head movements, resulting in a high agreement among the external observers (ICC_E = 0.91, ICC_HP = 0.72).This work was funded by the EPSRC under its IDEAS Factory Sandpits call on Digital Personhood (Grant Ref· EP/L00416X/1)
Timing is everything: A spatio-temporal approach to the analysis of facial actions
This thesis presents a fully automatic facial expression analysis system based on the Facial Action
Coding System (FACS). FACS is the best known and the most commonly used system to describe
facial activity in terms of facial muscle actions (i.e., action units, AUs). We will present our research
on the analysis of the morphological, spatio-temporal and behavioural aspects of facial expressions.
In contrast with most other researchers in the field who use appearance based techniques, we use a
geometric feature based approach. We will argue that that approach is more suitable for analysing
facial expression temporal dynamics. Our system is capable of explicitly exploring the temporal
aspects of facial expressions from an input colour video in terms of their onset (start), apex (peak)
and offset (end).
The fully automatic system presented here detects 20 facial points in the first frame and tracks them
throughout the video. From the tracked points we compute geometry-based features which serve as
the input to the remainder of our systems. The AU activation detection system uses GentleBoost
feature selection and a Support Vector Machine (SVM) classifier to find which AUs were present in an
expression. Temporal dynamics of active AUs are recognised by a hybrid GentleBoost-SVM-Hidden
Markov model classifier. The system is capable of analysing 23 out of 27 existing AUs with high
accuracy.
The main contributions of the work presented in this thesis are the following: we have created a
method for fully automatic AU analysis with state-of-the-art recognition results. We have proposed
for the first time a method for recognition of the four temporal phases of an AU. We have build the
largest comprehensive database of facial expressions to date. We also present for the first time in the
literature two studies for automatic distinction between posed and spontaneous expressions
A Multimodal Approach for Monitoring Driving Behavior and Emotions
Studies have indicated that emotions can significantly be influenced by environmental factors; these factors can also significantly influence drivers’ emotional state and, accordingly, their driving behavior. Furthermore, as the demand for autonomous vehicles is expected to significantly increase within the next decade, a proper understanding of drivers’/passengers’ emotions, behavior, and preferences will be needed in order to create an acceptable level of trust with humans. This paper proposes a novel semi-automated approach for understanding the effect of environmental factors on drivers’ emotions and behavioral changes through a naturalistic driving study. This setup includes a frontal road and facial camera, a smart watch for tracking physiological measurements, and a Controller Area Network (CAN) serial data logger. The results suggest that the driver’s affect is highly influenced by the type of road and the weather conditions, which have the potential to change driving behaviors. For instance, when the research defines emotional metrics as valence and engagement, results reveal there exist significant differences between human emotion in different weather conditions and road types. Participants’ engagement was higher in rainy and clear weather compared to cloudy weather. More-over, engagement was higher on city streets and highways compared to one-lane roads and two-lane highways
Bio-inspired multisensory integration of social signals
Emotions understanding represents a core aspect of human communication. Our social behaviours
are closely linked to expressing our emotions and understanding others’ emotional and mental
states through social signals. Emotions are expressed in a multisensory manner, where humans
use social signals from different sensory modalities such as facial expression, vocal changes, or
body language. The human brain integrates all relevant information to create a new multisensory
percept and derives emotional meaning.
There exists a great interest for emotions recognition in various fields such as HCI, gaming,
marketing, and assistive technologies. This demand is driving an increase in research on multisensory
emotion recognition. The majority of existing work proceeds by extracting meaningful
features from each modality and applying fusion techniques either at a feature level or decision
level. However, these techniques are ineffective in translating the constant talk and feedback
between different modalities. Such constant talk is particularly crucial in continuous emotion
recognition, where one modality can predict, enhance and complete the other.
This thesis proposes novel architectures for multisensory emotions recognition inspired by
multisensory integration in the brain. First, we explore the use of bio-inspired unsupervised
learning for unisensory emotion recognition for audio and visual modalities. Then we propose
three multisensory integration models, based on different pathways for multisensory integration
in the brain; that is, integration by convergence, early cross-modal enhancement, and integration
through neural synchrony. The proposed models are designed and implemented using third generation
neural networks, Spiking Neural Networks (SNN) with unsupervised learning. The
models are evaluated using widely adopted, third-party datasets and compared to state-of-the-art
multimodal fusion techniques, such as early, late and deep learning fusion. Evaluation results
show that the three proposed models achieve comparable results to state-of-the-art supervised
learning techniques. More importantly, this thesis shows models that can translate a constant
talk between modalities during the training phase. Each modality can predict, complement and
enhance the other using constant feedback. The cross-talk between modalities adds an insight
into emotions compared to traditional fusion techniques
Multi-Modality Human Action Recognition
Human action recognition is very useful in many applications in various areas, e.g. video surveillance, HCI (Human computer interaction), video retrieval, gaming and security. Recently, human action recognition becomes an active research topic in computer vision and pattern recognition. A number of action recognition approaches have been proposed. However, most of the approaches are designed on the RGB images sequences, where the action data was collected by RGB/intensity camera. Thus the recognition performance is usually related to various occlusion, background, and lighting conditions of the image sequences. If more information can be provided along with the image sequences, more data sources other than the RGB video can be utilized, human actions could be better represented and recognized by the designed computer vision system.;In this dissertation, the multi-modality human action recognition is studied. On one hand, we introduce the study of multi-spectral action recognition, which involves the information from different spectrum beyond visible, e.g. infrared and near infrared. Action recognition in individual spectra is explored and new methods are proposed. Then the cross-spectral action recognition is also investigated and novel approaches are proposed in our work. On the other hand, since the depth imaging technology has made a significant progress recently, where depth information can be captured simultaneously with the RGB videos. The depth-based human action recognition is also investigated. I first propose a method combining different type of depth data to recognize human actions. Then a thorough evaluation is conducted on spatiotemporal interest point (STIP) based features for depth-based action recognition. Finally, I advocate the study of fusing different features for depth-based action analysis. Moreover, human depression recognition is studied by combining facial appearance model as well as facial dynamic model
Artificial Intelligence Tools for Facial Expression Analysis.
Inner emotions show visibly upon the human face and are understood as a basic guide to an individual’s inner world. It is, therefore, possible to determine a person’s attitudes and the effects of others’ behaviour on their deeper feelings through examining facial expressions. In real world applications, machines that interact with people need strong facial expression recognition. This recognition is seen to hold advantages for varied applications in affective computing, advanced human-computer interaction, security, stress and depression analysis, robotic systems, and machine learning. This thesis starts by proposing a benchmark of dynamic versus static methods for facial Action Unit (AU) detection. AU activation is a set of local individual facial muscle parts that occur in unison constituting a natural facial expression event. Detecting AUs automatically can provide explicit benefits since it considers both static and dynamic facial features. For this research, AU occurrence activation detection was conducted by extracting features (static and dynamic) of both nominal hand-crafted and deep learning representation from each static image of a video. This confirmed the superior ability of a pretrained model that leaps in performance. Next, temporal modelling was investigated to detect the underlying temporal variation phases using supervised and unsupervised methods from dynamic sequences. During these processes, the importance of stacking dynamic on top of static was discovered in encoding deep features for learning temporal information when combining the spatial and temporal schemes simultaneously. Also, this study found that fusing both temporal and temporal features will give more long term temporal pattern information. Moreover, we hypothesised that using an unsupervised method would enable the leaching of invariant information from dynamic textures. Recently, fresh cutting-edge developments have been created by approaches based on Generative Adversarial Networks (GANs). In the second section of this thesis, we propose a model based on the adoption of an unsupervised DCGAN for the facial features’ extraction and classification to achieve the following: the creation of facial expression images under different arbitrary poses (frontal, multi-view, and in the wild), and the recognition of emotion categories and AUs, in an attempt to resolve the problem of recognising the static seven classes of emotion in the wild. Thorough experimentation with the proposed cross-database performance demonstrates that this approach can improve the generalization results. Additionally, we showed that the features learnt by the DCGAN process are poorly suited to encoding facial expressions when observed under multiple views, or when trained from a limited number of positive examples. Finally, this research focuses on disentangling identity from expression for facial expression recognition. A novel technique was implemented for emotion recognition from a single monocular image. A large-scale dataset (Face vid) was created from facial image videos which were rich in variations and distribution of facial dynamics, appearance, identities, expressions, and 3D poses. This dataset was used to train a DCNN (ResNet) to regress the expression parameters from a 3D Morphable Model jointly with a back-end classifier
- …