3,780 research outputs found
Discriminatively Trained Latent Ordinal Model for Video Classification
We study the problem of video classification for facial analysis and human
action recognition. We propose a novel weakly supervised learning method that
models the video as a sequence of automatically mined, discriminative
sub-events (eg. onset and offset phase for "smile", running and jumping for
"highjump"). The proposed model is inspired by the recent works on Multiple
Instance Learning and latent SVM/HCRF -- it extends such frameworks to model
the ordinal aspect in the videos, approximately. We obtain consistent
improvements over relevant competitive baselines on four challenging and
publicly available video based facial analysis datasets for prediction of
expression, clinical pain and intent in dyadic conversations and on three
challenging human action datasets. We also validate the method with qualitative
results and show that they largely support the intuitions behind the method.Comment: Paper accepted in IEEE TPAMI. arXiv admin note: substantial text
overlap with arXiv:1604.0150
Facial Action Unit Detection Using Attention and Relation Learning
Attention mechanism has recently attracted increasing attentions in the field
of facial action unit (AU) detection. By finding the region of interest of each
AU with the attention mechanism, AU-related local features can be captured.
Most of the existing attention based AU detection works use prior knowledge to
predefine fixed attentions or refine the predefined attentions within a small
range, which limits their capacity to model various AUs. In this paper, we
propose an end-to-end deep learning based attention and relation learning
framework for AU detection with only AU labels, which has not been explored
before. In particular, multi-scale features shared by each AU are learned
firstly, and then both channel-wise and spatial attentions are adaptively
learned to select and extract AU-related local features. Moreover, pixel-level
relations for AUs are further captured to refine spatial attentions so as to
extract more relevant local features. Without changing the network
architecture, our framework can be easily extended for AU intensity estimation.
Extensive experiments show that our framework (i) soundly outperforms the
state-of-the-art methods for both AU detection and AU intensity estimation on
the challenging BP4D, DISFA, FERA 2015 and BP4D+ benchmarks, (ii) can
adaptively capture the correlated regions of each AU, and (iii) also works well
under severe occlusions and large poses.Comment: This paper is accepted by IEEE Transactions on Affective Computin
Self-supervised Multi-level Face Model Learning for Monocular Reconstruction at over 250 Hz
The reconstruction of dense 3D models of face geometry and appearance from a
single image is highly challenging and ill-posed. To constrain the problem,
many approaches rely on strong priors, such as parametric face models learned
from limited 3D scan data. However, prior models restrict generalization of the
true diversity in facial geometry, skin reflectance and illumination. To
alleviate this problem, we present the first approach that jointly learns 1) a
regressor for face shape, expression, reflectance and illumination on the basis
of 2) a concurrently learned parametric face model. Our multi-level face model
combines the advantage of 3D Morphable Models for regularization with the
out-of-space generalization of a learned corrective space. We train end-to-end
on in-the-wild images without dense annotations by fusing a convolutional
encoder with a differentiable expert-designed renderer and a self-supervised
training loss, both defined at multiple detail levels. Our approach compares
favorably to the state-of-the-art in terms of reconstruction quality, better
generalizes to real world faces, and runs at over 250 Hz.Comment: CVPR 2018 (Oral). Project webpage:
https://gvv.mpi-inf.mpg.de/projects/FML
Unsupervised Learning Facial Parameter Regressor for Action Unit Intensity Estimation via Differentiable Renderer
Facial action unit (AU) intensity is an index to describe all visually
discernible facial movements. Most existing methods learn intensity estimator
with limited AU data, while they lack generalization ability out of the
dataset. In this paper, we present a framework to predict the facial parameters
(including identity parameters and AU parameters) based on a bone-driven face
model (BDFM) under different views. The proposed framework consists of a
feature extractor, a generator, and a facial parameter regressor. The regressor
can fit the physical meaning parameters of the BDFM from a single face image
with the help of the generator, which maps the facial parameters to the
game-face images as a differentiable renderer. Besides, identity loss, loopback
loss, and adversarial loss can improve the regressive results. Quantitative
evaluations are performed on two public databases BP4D and DISFA, which
demonstrates that the proposed method can achieve comparable or better
performance than the state-of-the-art methods. What's more, the qualitative
results also demonstrate the validity of our method in the wild
- …