98,250 research outputs found
LOMo: Latent Ordinal Model for Facial Analysis in Videos
We study the problem of facial analysis in videos. We propose a novel weakly
supervised learning method that models the video event (expression, pain etc.)
as a sequence of automatically mined, discriminative sub-events (eg. onset and
offset phase for smile, brow lower and cheek raise for pain). The proposed
model is inspired by the recent works on Multiple Instance Learning and latent
SVM/HCRF- it extends such frameworks to model the ordinal or temporal aspect in
the videos, approximately. We obtain consistent improvements over relevant
competitive baselines on four challenging and publicly available video based
facial analysis datasets for prediction of expression, clinical pain and intent
in dyadic conversations. In combination with complimentary features, we report
state-of-the-art results on these datasets.Comment: 2016 IEEE Conference on Computer Vision and Pattern Recognition
(CVPR
Generating 3D faces using Convolutional Mesh Autoencoders
Learned 3D representations of human faces are useful for computer vision
problems such as 3D face tracking and reconstruction from images, as well as
graphics applications such as character generation and animation. Traditional
models learn a latent representation of a face using linear subspaces or
higher-order tensor generalizations. Due to this linearity, they can not
capture extreme deformations and non-linear expressions. To address this, we
introduce a versatile model that learns a non-linear representation of a face
using spectral convolutions on a mesh surface. We introduce mesh sampling
operations that enable a hierarchical mesh representation that captures
non-linear variations in shape and expression at multiple scales within the
model. In a variational setting, our model samples diverse realistic 3D faces
from a multivariate Gaussian distribution. Our training data consists of 20,466
meshes of extreme expressions captured over 12 different subjects. Despite
limited training data, our trained model outperforms state-of-the-art face
models with 50% lower reconstruction error, while using 75% fewer parameters.
We also show that, replacing the expression space of an existing
state-of-the-art face model with our autoencoder, achieves a lower
reconstruction error. Our data, model and code are available at
http://github.com/anuragranj/com
DeepCoder: Semi-parametric Variational Autoencoders for Automatic Facial Action Coding
Human face exhibits an inherent hierarchy in its representations (i.e.,
holistic facial expressions can be encoded via a set of facial action units
(AUs) and their intensity). Variational (deep) auto-encoders (VAE) have shown
great results in unsupervised extraction of hierarchical latent representations
from large amounts of image data, while being robust to noise and other
undesired artifacts. Potentially, this makes VAEs a suitable approach for
learning facial features for AU intensity estimation. Yet, most existing
VAE-based methods apply classifiers learned separately from the encoded
features. By contrast, the non-parametric (probabilistic) approaches, such as
Gaussian Processes (GPs), typically outperform their parametric counterparts,
but cannot deal easily with large amounts of data. To this end, we propose a
novel VAE semi-parametric modeling framework, named DeepCoder, which combines
the modeling power of parametric (convolutional) and nonparametric (ordinal
GPs) VAEs, for joint learning of (1) latent representations at multiple levels
in a task hierarchy1, and (2) classification of multiple ordinal outputs. We
show on benchmark datasets for AU intensity estimation that the proposed
DeepCoder outperforms the state-of-the-art approaches, and related VAEs and
deep learning models.Comment: ICCV 2017 - accepte
- …