4,136 research outputs found
Emergent Leadership Detection Across Datasets
Automatic detection of emergent leaders in small groups from nonverbal
behaviour is a growing research topic in social signal processing but existing
methods were evaluated on single datasets -- an unrealistic assumption for
real-world applications in which systems are required to also work in settings
unseen at training time. It therefore remains unclear whether current methods
for emergent leadership detection generalise to similar but new settings and to
which extent. To overcome this limitation, we are the first to study a
cross-dataset evaluation setting for the emergent leadership detection task. We
provide evaluations for within- and cross-dataset prediction using two current
datasets (PAVIS and MPIIGroupInteraction), as well as an investigation on the
robustness of commonly used feature channels (visual focus of attention, body
pose, facial action units, speaking activity) and online prediction in the
cross-dataset setting. Our evaluations show that using pose and eye contact
based features, cross-dataset prediction is possible with an accuracy of 0.68,
as such providing another important piece of the puzzle towards emergent
leadership detection in the real world.Comment: 5 pages, 3 figure
ICface: Interpretable and Controllable Face Reenactment Using GANs
This paper presents a generic face animator that is able to control the pose
and expressions of a given face image. The animation is driven by human
interpretable control signals consisting of head pose angles and the Action
Unit (AU) values. The control information can be obtained from multiple sources
including external driving videos and manual controls. Due to the interpretable
nature of the driving signal, one can easily mix the information between
multiple sources (e.g. pose from one image and expression from another) and
apply selective post-production editing. The proposed face animator is
implemented as a two-stage neural network model that is learned in a
self-supervised manner using a large video collection. The proposed
Interpretable and Controllable face reenactment network (ICface) is compared to
the state-of-the-art neural network-based face animation techniques in multiple
tasks. The results indicate that ICface produces better visual quality while
being more versatile than most of the comparison methods. The introduced model
could provide a lightweight and easy to use tool for a multitude of advanced
image and video editing tasks.Comment: Accepted in WACV-202
Every Smile is Unique: Landmark-Guided Diverse Smile Generation
Each smile is unique: one person surely smiles in different ways (e.g.,
closing/opening the eyes or mouth). Given one input image of a neutral face,
can we generate multiple smile videos with distinctive characteristics? To
tackle this one-to-many video generation problem, we propose a novel deep
learning architecture named Conditional Multi-Mode Network (CMM-Net). To better
encode the dynamics of facial expressions, CMM-Net explicitly exploits facial
landmarks for generating smile sequences. Specifically, a variational
auto-encoder is used to learn a facial landmark embedding. This single
embedding is then exploited by a conditional recurrent network which generates
a landmark embedding sequence conditioned on a specific expression (e.g.,
spontaneous smile). Next, the generated landmark embeddings are fed into a
multi-mode recurrent landmark generator, producing a set of landmark sequences
still associated to the given smile class but clearly distinct from each other.
Finally, these landmark sequences are translated into face videos. Our
experimental results demonstrate the effectiveness of our CMM-Net in generating
realistic videos of multiple smile expressions.Comment: Accepted as a poster in Conference on Computer Vision and Pattern
Recognition (CVPR), 201
Island Loss for Learning Discriminative Features in Facial Expression Recognition
Over the past few years, Convolutional Neural Networks (CNNs) have shown
promise on facial expression recognition. However, the performance degrades
dramatically under real-world settings due to variations introduced by subtle
facial appearance changes, head pose variations, illumination changes, and
occlusions.
In this paper, a novel island loss is proposed to enhance the discriminative
power of the deeply learned features. Specifically, the IL is designed to
reduce the intra-class variations while enlarging the inter-class differences
simultaneously. Experimental results on four benchmark expression databases
have demonstrated that the CNN with the proposed island loss (IL-CNN)
outperforms the baseline CNN models with either traditional softmax loss or the
center loss and achieves comparable or better performance compared with the
state-of-the-art methods for facial expression recognition.Comment: 8 pages, 3 figure
Facial expression recognition in the wild : from individual to group
The progress in computing technology has increased the demand for smart systems capable of understanding human affect and emotional manifestations. One of the crucial factors in designing systems equipped with such intelligence is to have accurate automatic Facial Expression Recognition (FER) methods. In computer vision, automatic facial expression analysis is an active field of research for over two decades now. However, there are still a lot of questions unanswered. The research presented in this thesis attempts to address some of the key issues of FER in challenging conditions mentioned as follows: 1) creating a facial expressions database representing real-world conditions; 2) devising Head Pose Normalisation (HPN) methods which are independent of facial parts location; 3) creating automatic methods for the analysis of mood of group of people. The central hypothesis of the thesis is that extracting close to real-world data from movies and performing facial expression analysis on movies is a stepping stone in the direction of moving the analysis of faces towards real-world, unconstrained condition. A temporal facial expressions database, Acted Facial Expressions in the Wild (AFEW) is proposed. The database is constructed and labelled using a semi-automatic process based on closed caption subtitle based keyword search. Currently, AFEW is the largest facial expressions database representing challenging conditions available to the research community. For providing a common platform to researchers in order to evaluate and extend their state-of-the-art FER methods, the first Emotion Recognition in the Wild (EmotiW) challenge based on AFEW is proposed. An image-only based facial expressions database Static Facial Expressions In The Wild (SFEW) extracted from AFEW is proposed. Furthermore, the thesis focuses on HPN for real-world images. Earlier methods were based on fiducial points. However, as fiducial points detection is an open problem for real-world images, HPN can be error-prone. A HPN method based on response maps generated from part-detectors is proposed. The proposed shape-constrained method does not require fiducial points and head pose information, which makes it suitable for real-world images. Data from movies and the internet, representing real-world conditions poses another major challenge of the presence of multiple subjects to the research community. This defines another focus of this thesis where a novel approach for modeling the perception of mood of a group of people in an image is presented. A new database is constructed from Flickr based on keywords related to social events. Three models are proposed: averaging based Group Expression Model (GEM), Weighted Group Expression Model (GEM_w) and Augmented Group Expression Model (GEM_LDA). GEM_w is based on social contextual attributes, which are used as weights on each person's contribution towards the overall group's mood. Further, GEM_LDA is based on topic model and feature augmentation. The proposed framework is applied to applications of group candid shot selection and event summarisation. The application of Structural SIMilarity (SSIM) index metric is explored for finding similar facial expressions. The proposed framework is applied to the problem of creating image albums based on facial expressions, finding corresponding expressions for training facial performance transfer algorithms
- …