13,219 research outputs found
ICface: Interpretable and Controllable Face Reenactment Using GANs
This paper presents a generic face animator that is able to control the pose
and expressions of a given face image. The animation is driven by human
interpretable control signals consisting of head pose angles and the Action
Unit (AU) values. The control information can be obtained from multiple sources
including external driving videos and manual controls. Due to the interpretable
nature of the driving signal, one can easily mix the information between
multiple sources (e.g. pose from one image and expression from another) and
apply selective post-production editing. The proposed face animator is
implemented as a two-stage neural network model that is learned in a
self-supervised manner using a large video collection. The proposed
Interpretable and Controllable face reenactment network (ICface) is compared to
the state-of-the-art neural network-based face animation techniques in multiple
tasks. The results indicate that ICface produces better visual quality while
being more versatile than most of the comparison methods. The introduced model
could provide a lightweight and easy to use tool for a multitude of advanced
image and video editing tasks.Comment: Accepted in WACV-202
Multichannel Attention Network for Analyzing Visual Behavior in Public Speaking
Public speaking is an important aspect of human communication and
interaction. The majority of computational work on public speaking concentrates
on analyzing the spoken content, and the verbal behavior of the speakers. While
the success of public speaking largely depends on the content of the talk, and
the verbal behavior, non-verbal (visual) cues, such as gestures and physical
appearance also play a significant role. This paper investigates the importance
of visual cues by estimating their contribution towards predicting the
popularity of a public lecture. For this purpose, we constructed a large
database of more than TED talk videos. As a measure of popularity of the
TED talks, we leverage the corresponding (online) viewers' ratings from
YouTube. Visual cues related to facial and physical appearance, facial
expressions, and pose variations are extracted from the video frames using
convolutional neural network (CNN) models. Thereafter, an attention-based long
short-term memory (LSTM) network is proposed to predict the video popularity
from the sequence of visual features. The proposed network achieves
state-of-the-art prediction accuracy indicating that visual cues alone contain
highly predictive information about the popularity of a talk. Furthermore, our
network learns a human-like attention mechanism, which is particularly useful
for interpretability, i.e. how attention varies with time, and across different
visual cues by indicating their relative importance
Some like it hot - visual guidance for preference prediction
For people first impressions of someone are of determining importance. They
are hard to alter through further information. This begs the question if a
computer can reach the same judgement. Earlier research has already pointed out
that age, gender, and average attractiveness can be estimated with reasonable
precision. We improve the state-of-the-art, but also predict - based on
someone's known preferences - how much that particular person is attracted to a
novel face. Our computational pipeline comprises a face detector, convolutional
neural networks for the extraction of deep features, standard support vector
regression for gender, age and facial beauty, and - as the main novelties -
visual regularized collaborative filtering to infer inter-person preferences as
well as a novel regression technique for handling visual queries without rating
history. We validate the method using a very large dataset from a dating site
as well as images from celebrities. Our experiments yield convincing results,
i.e. we predict 76% of the ratings correctly solely based on an image, and
reveal some sociologically relevant conclusions. We also validate our
collaborative filtering solution on the standard MovieLens rating dataset,
augmented with movie posters, to predict an individual's movie rating. We
demonstrate our algorithms on howhot.io which went viral around the Internet
with more than 50 million pictures evaluated in the first month.Comment: accepted for publication at CVPR 201
- …