146 research outputs found
Automatic Analysis of Facial Expressions Based on Deep Covariance Trajectories
In this paper, we propose a new approach for facial expression recognition
using deep covariance descriptors. The solution is based on the idea of
encoding local and global Deep Convolutional Neural Network (DCNN) features
extracted from still images, in compact local and global covariance
descriptors. The space geometry of the covariance matrices is that of Symmetric
Positive Definite (SPD) matrices. By conducting the classification of static
facial expressions using Support Vector Machine (SVM) with a valid Gaussian
kernel on the SPD manifold, we show that deep covariance descriptors are more
effective than the standard classification with fully connected layers and
softmax. Besides, we propose a completely new and original solution to model
the temporal dynamic of facial expressions as deep trajectories on the SPD
manifold. As an extension of the classification pipeline of covariance
descriptors, we apply SVM with valid positive definite kernels derived from
global alignment for deep covariance trajectories classification. By performing
extensive experiments on the Oulu-CASIA, CK+, and SFEW datasets, we show that
both the proposed static and dynamic approaches achieve state-of-the-art
performance for facial expression recognition outperforming many recent
approaches.Comment: A preliminary version of this work appeared in "Otberdout N, Kacem A,
Daoudi M, Ballihi L, Berretti S. Deep Covariance Descriptors for Facial
Expression Recognition, in British Machine Vision Conference 2018, BMVC 2018,
Northumbria University, Newcastle, UK, September 3-6, 2018. ; 2018 :159."
arXiv admin note: substantial text overlap with arXiv:1805.0386
Dynamic Facial Expression Generation on Hilbert Hypersphere with Conditional Wasserstein Generative Adversarial Nets
In this work, we propose a novel approach for generating videos of the six
basic facial expressions given a neutral face image. We propose to exploit the
face geometry by modeling the facial landmarks motion as curves encoded as
points on a hypersphere. By proposing a conditional version of manifold-valued
Wasserstein generative adversarial network (GAN) for motion generation on the
hypersphere, we learn the distribution of facial expression dynamics of
different classes, from which we synthesize new facial expression motions. The
resulting motions can be transformed to sequences of landmarks and then to
images sequences by editing the texture information using another conditional
Generative Adversarial Network. To the best of our knowledge, this is the first
work that explores manifold-valued representations with GAN to address the
problem of dynamic facial expression generation. We evaluate our proposed
approach both quantitatively and qualitatively on two public datasets;
Oulu-CASIA and MUG Facial Expression. Our experimental results demonstrate the
effectiveness of our approach in generating realistic videos with continuous
motion, realistic appearance and identity preservation. We also show the
efficiency of our framework for dynamic facial expressions generation, dynamic
facial expression transfer and data augmentation for training improved emotion
recognition models
Learning Landmarks Motion from Speech for Speaker-Agnostic 3D Talking Heads Generation
This paper presents a novel approach for generating 3D talking heads from raw
audio inputs. Our method grounds on the idea that speech related movements can
be comprehensively and efficiently described by the motion of a few control
points located on the movable parts of the face, i.e., landmarks. The
underlying musculoskeletal structure then allows us to learn how their motion
influences the geometrical deformations of the whole face. The proposed method
employs two distinct models to this aim: the first one learns to generate the
motion of a sparse set of landmarks from the given audio. The second model
expands such landmarks motion to a dense motion field, which is utilized to
animate a given 3D mesh in neutral state. Additionally, we introduce a novel
loss function, named Cosine Loss, which minimizes the angle between the
generated motion vectors and the ground truth ones. Using landmarks in 3D
talking head generation offers various advantages such as consistency,
reliability, and obviating the need for manual-annotation. Our approach is
designed to be identity-agnostic, enabling high-quality facial animations for
any users without additional data or training
Monocular 3D Body Shape Reconstruction under Clothing
Estimating the 3D shape of objects from monocular images is a well-established and challenging task in the computer vision field. Further challenges arise when highly deformable objects, such as human faces or bodies, are considered. In this work, we address the problem of estimating the 3D shape of a human body from single images. In particular, we provide a solution to the problem of estimating the shape of the body when the subject is wearing clothes. This is a highly challenging scenario as loose clothes might hide the underlying body shape to a large extent. To this aim, we make use of a parametric 3D body model, the SMPL, whose parameters describe the body pose and shape of the body. Our main intuition is that the shape parameters associated with an individual should not change whether the subject is wearing clothes or not. To improve the shape estimation under clothing, we train a deep convolutional network to regress the shape parameters from a single image of a person. To increase the robustness to clothing, we build our training dataset by associating the shape parameters of a “minimally clothed” person to other samples of the same person wearing looser clothes. Experimental validation shows that our approach can more accurately estimate body shape parameters with respect to state-of-the-art approaches, even in the case of loose clothes
4DSR-GCN: 4D Video Point Cloud Upsampling using Graph Convolutional Networks
Time varying sequences of 3D point clouds, or 4D point clouds, are now being
acquired at an increasing pace in several applications (e.g., LiDAR in
autonomous or assisted driving). In many cases, such volume of data is
transmitted, thus requiring that proper compression tools are applied to either
reduce the resolution or the bandwidth. In this paper, we propose a new
solution for upscaling and restoration of time-varying 3D video point clouds
after they have been heavily compressed. In consideration of recent growing
relevance of 3D applications, %We focused on a model allowing user-side
upscaling and artifact removal for 3D video point clouds, a real-time stream of
which would require . Our model consists of a specifically designed Graph
Convolutional Network (GCN) that combines Dynamic Edge Convolution and Graph
Attention Networks for feature aggregation in a Generative Adversarial setting.
By taking inspiration PointNet++, We present a different way to sample dense
point clouds with the intent to make these modules work in synergy to provide
each node enough features about its neighbourhood in order to later on generate
new vertices. Compared to other solutions in the literature that address the
same task, our proposed model is capable of obtaining comparable results in
terms of quality of the reconstruction, while using a substantially lower
number of parameters (about 300KB), making our solution deployable in edge
computing devices such as LiDAR
- …