4,352 research outputs found
HeadOn: Real-time Reenactment of Human Portrait Videos
We propose HeadOn, the first real-time source-to-target reenactment approach
for complete human portrait videos that enables transfer of torso and head
motion, face expression, and eye gaze. Given a short RGB-D video of the target
actor, we automatically construct a personalized geometry proxy that embeds a
parametric head, eye, and kinematic torso model. A novel real-time reenactment
algorithm employs this proxy to photo-realistically map the captured motion
from the source actor to the target actor. On top of the coarse geometric
proxy, we propose a video-based rendering technique that composites the
modified target portrait video via view- and pose-dependent texturing, and
creates photo-realistic imagery of the target actor under novel torso and head
poses, facial expressions, and gaze directions. To this end, we propose a
robust tracking of the face and torso of the source actor. We extensively
evaluate our approach and show significant improvements in enabling much
greater flexibility in creating realistic reenacted output videos.Comment: Video: https://www.youtube.com/watch?v=7Dg49wv2c_g Presented at
Siggraph'1
A Comprehensive Performance Evaluation of Deformable Face Tracking "In-the-Wild"
Recently, technologies such as face detection, facial landmark localisation
and face recognition and verification have matured enough to provide effective
and efficient solutions for imagery captured under arbitrary conditions
(referred to as "in-the-wild"). This is partially attributed to the fact that
comprehensive "in-the-wild" benchmarks have been developed for face detection,
landmark localisation and recognition/verification. A very important technology
that has not been thoroughly evaluated yet is deformable face tracking
"in-the-wild". Until now, the performance has mainly been assessed
qualitatively by visually assessing the result of a deformable face tracking
technology on short videos. In this paper, we perform the first, to the best of
our knowledge, thorough evaluation of state-of-the-art deformable face tracking
pipelines using the recently introduced 300VW benchmark. We evaluate many
different architectures focusing mainly on the task of on-line deformable face
tracking. In particular, we compare the following general strategies: (a)
generic face detection plus generic facial landmark localisation, (b) generic
model free tracking plus generic facial landmark localisation, as well as (c)
hybrid approaches using state-of-the-art face detection, model free tracking
and facial landmark localisation technologies. Our evaluation reveals future
avenues for further research on the topic.Comment: E. Antonakos and P. Snape contributed equally and have joint second
authorshi
FML: Face Model Learning from Videos
Monocular image-based 3D reconstruction of faces is a long-standing problem
in computer vision. Since image data is a 2D projection of a 3D face, the
resulting depth ambiguity makes the problem ill-posed. Most existing methods
rely on data-driven priors that are built from limited 3D face scans. In
contrast, we propose multi-frame video-based self-supervised training of a deep
network that (i) learns a face identity model both in shape and appearance
while (ii) jointly learning to reconstruct 3D faces. Our face model is learned
using only corpora of in-the-wild video clips collected from the Internet. This
virtually endless source of training data enables learning of a highly general
3D face model. In order to achieve this, we propose a novel multi-frame
consistency loss that ensures consistent shape and appearance across multiple
frames of a subject's face, thus minimizing depth ambiguity. At test time we
can use an arbitrary number of frames, so that we can perform both monocular as
well as multi-frame reconstruction.Comment: CVPR 2019 (Oral). Video: https://www.youtube.com/watch?v=SG2BwxCw0lQ,
Project Page: https://gvv.mpi-inf.mpg.de/projects/FML19
MoFA: Model-based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction
In this work we propose a novel model-based deep convolutional autoencoder
that addresses the highly challenging problem of reconstructing a 3D human face
from a single in-the-wild color image. To this end, we combine a convolutional
encoder network with an expert-designed generative model that serves as
decoder. The core innovation is our new differentiable parametric decoder that
encapsulates image formation analytically based on a generative model. Our
decoder takes as input a code vector with exactly defined semantic meaning that
encodes detailed face pose, shape, expression, skin reflectance and scene
illumination. Due to this new way of combining CNN-based with model-based face
reconstruction, the CNN-based encoder learns to extract semantically meaningful
parameters from a single monocular input image. For the first time, a CNN
encoder and an expert-designed generative model can be trained end-to-end in an
unsupervised manner, which renders training on very large (unlabeled) real
world data feasible. The obtained reconstructions compare favorably to current
state-of-the-art approaches in terms of quality and richness of representation.Comment: International Conference on Computer Vision (ICCV) 2017 (Oral), 13
page
MobiFace: A Novel Dataset for Mobile Face Tracking in the Wild
Face tracking serves as the crucial initial step in mobile applications
trying to analyse target faces over time in mobile settings. However, this
problem has received little attention, mainly due to the scarcity of dedicated
face tracking benchmarks. In this work, we introduce MobiFace, the first
dataset for single face tracking in mobile situations. It consists of 80
unedited live-streaming mobile videos captured by 70 different smartphone users
in fully unconstrained environments. Over bounding boxes are manually
labelled. The videos are carefully selected to cover typical smartphone usage.
The videos are also annotated with 14 attributes, including 6 newly proposed
attributes and 8 commonly seen in object tracking. 36 state-of-the-art
trackers, including facial landmark trackers, generic object trackers and
trackers that we have fine-tuned or improved, are evaluated. The results
suggest that mobile face tracking cannot be solved through existing approaches.
In addition, we show that fine-tuning on the MobiFace training data
significantly boosts the performance of deep learning-based trackers,
suggesting that MobiFace captures the unique characteristics of mobile face
tracking. Our goal is to offer the community a diverse dataset to enable the
design and evaluation of mobile face trackers. The dataset, annotations and the
evaluation server will be on \url{https://mobiface.github.io/}.Comment: To appear on The 14th IEEE International Conference on Automatic Face
and Gesture Recognition (FG 2019
- …