1,860 research outputs found
HeadOn: Real-time Reenactment of Human Portrait Videos
We propose HeadOn, the first real-time source-to-target reenactment approach
for complete human portrait videos that enables transfer of torso and head
motion, face expression, and eye gaze. Given a short RGB-D video of the target
actor, we automatically construct a personalized geometry proxy that embeds a
parametric head, eye, and kinematic torso model. A novel real-time reenactment
algorithm employs this proxy to photo-realistically map the captured motion
from the source actor to the target actor. On top of the coarse geometric
proxy, we propose a video-based rendering technique that composites the
modified target portrait video via view- and pose-dependent texturing, and
creates photo-realistic imagery of the target actor under novel torso and head
poses, facial expressions, and gaze directions. To this end, we propose a
robust tracking of the face and torso of the source actor. We extensively
evaluate our approach and show significant improvements in enabling much
greater flexibility in creating realistic reenacted output videos.Comment: Video: https://www.youtube.com/watch?v=7Dg49wv2c_g Presented at
Siggraph'1
Gazedirector: Fully articulated eye gaze redirection in video
We present GazeDirector, a new approach for eye gaze redirection that uses model-fitting. Our method first tracks the eyes by fitting a multi-part eye region model to video frames using analysis-by-synthesis, thereby recovering eye region shape, texture, pose, and gaze simultaneously. It then redirects gaze by 1) warping the eyelids from the original image using a model-derived flow field, and 2) rendering and compositing synthesized 3D eyeballs onto the output image in a photorealistic manner. GazeDirector allows us to change where people are looking without person-specific training data, and with full articulation, i.e. we can precisely specify new gaze directions in 3D. Quantitatively, we evaluate both model-fitting and gaze synthesis, with experiments for gaze estimation and redirection on the Columbia gaze dataset. Qualitatively, we compare GazeDirector against recent work on gaze redirection, showing better results especially for large redirection angles. Finally, we demonstrate gaze redirection on YouTube videos by introducing new 3D gaze targets and by manipulating visual behavior
CUDA-GR: Controllable Unsupervised Domain Adaptation for Gaze Redirection
The aim of gaze redirection is to manipulate the gaze in an image to the
desired direction. However, existing methods are inadequate in generating
perceptually reasonable images. Advancement in generative adversarial networks
has shown excellent results in generating photo-realistic images. Though, they
still lack the ability to provide finer control over different image
attributes. To enable such fine-tuned control, one needs to obtain ground truth
annotations for the training data which can be very expensive. In this paper,
we propose an unsupervised domain adaptation framework, called CUDA-GR, that
learns to disentangle gaze representations from the labeled source domain and
transfers them to an unlabeled target domain. Our method enables fine-grained
control over gaze directions while preserving the appearance information of the
person. We show that the generated image-labels pairs in the target domain are
effective in knowledge transfer and can boost the performance of the downstream
tasks. Extensive experiments on the benchmarking datasets show that the
proposed method can outperform state-of-the-art techniques in both quantitative
and qualitative evaluation
High-Fidelity Eye Animatable Neural Radiance Fields for Human Face
Face rendering using neural radiance fields (NeRF) is a rapidly developing
research area in computer vision. While recent methods primarily focus on
controlling facial attributes such as identity and expression, they often
overlook the crucial aspect of modeling eyeball rotation, which holds
importance for various downstream tasks. In this paper, we aim to learn a face
NeRF model that is sensitive to eye movements from multi-view images. We
address two key challenges in eye-aware face NeRF learning: how to effectively
capture eyeball rotation for training and how to construct a manifold for
representing eyeball rotation. To accomplish this, we first fit FLAME, a
well-established parametric face model, to the multi-view images considering
multi-view consistency. Subsequently, we introduce a new Dynamic Eye-aware NeRF
(DeNeRF). DeNeRF transforms 3D points from different views into a canonical
space to learn a unified face NeRF model. We design an eye deformation field
for the transformation, including rigid transformation, e.g., eyeball rotation,
and non-rigid transformation. Through experiments conducted on the ETH-XGaze
dataset, we demonstrate that our model is capable of generating high-fidelity
images with accurate eyeball rotation and non-rigid periocular deformation,
even under novel viewing angles. Furthermore, we show that utilizing the
rendered images can effectively enhance gaze estimation performance.Comment: Under revie
A Differential Approach for Gaze Estimation
Non-invasive gaze estimation methods usually regress gaze directions directly
from a single face or eye image. However, due to important variabilities in eye
shapes and inner eye structures amongst individuals, universal models obtain
limited accuracies and their output usually exhibit high variance as well as
biases which are subject dependent. Therefore, increasing accuracy is usually
done through calibration, allowing gaze predictions for a subject to be mapped
to his/her actual gaze. In this paper, we introduce a novel image differential
method for gaze estimation. We propose to directly train a differential
convolutional neural network to predict the gaze differences between two eye
input images of the same subject. Then, given a set of subject specific
calibration images, we can use the inferred differences to predict the gaze
direction of a novel eye sample. The assumption is that by allowing the
comparison between two eye images, annoyance factors (alignment, eyelid
closing, illumination perturbations) which usually plague single image
prediction methods can be much reduced, allowing better prediction altogether.
Experiments on 3 public datasets validate our approach which constantly
outperforms state-of-the-art methods even when using only one calibration
sample or when the latter methods are followed by subject specific gaze
adaptation.Comment: Extension to our paper A differential approach for gaze estimation
with calibration (BMVC 2018) Submitted to PAMI on Aug. 7th, 2018 Accepted by
PAMI short on Dec. 2019, in IEEE Transactions on Pattern Analysis and Machine
Intelligenc
Effects of Character Guide in Immersive Virtual Reality Stories
Bringing cinematic experiences from traditional film screens into Virtual Reality (VR) has become an increasingly popular form of entertainment in recent years. VR provides viewers unprecedented film experience that allows them to freely explore around the environment and even interact with virtual props and characters. For the audience, this kind of experience raises their sense of presence in a different world, and may even stimulate their full immersion in story scenarios. However, different from traditional film-making, where the audience is completely passive in following along director’s decisions of storytelling, more freedom in VR might cause viewers to get lost on halfway watching a series of events that build up a story. Therefore, striking a balance between user interaction and narrative progression is a big challenge for filmmakers. To assist in organizing the research space, we presented a media review and the resulting framework to characterize the primary differences among different variations of film, media, games, and VR storytelling. The evaluation in particular provided us with knowledge that were closely associated with story-progression strategies and gaze redirection methods for interactive content in the commercial domain. Following the existing VR storytelling framework, we then approached the problem of guiding the audience through the major events of a story by introducing a virtual character as a travel companion who provides assistance in directing the viewer’s focus to the target scenes. The presented research explored a new technique that allowed a separate virtual character to be overlaid on top of an existing 360-degree video such that the added character react based on the head-tracking data to help indicate to the viewer the core focal content of the story. The motivation behind this research is to assist directors in using a virtual guiding character to increase the effectiveness of VR storytelling, assuring that viewers fully understand the story through completing a sequence of events, and possibly realize a rich literary experience. To assess the effectiveness of this technique, we performed a controlled experiment by applying the method in three immersive narrative experiences, each with a control condition that was free ii from guidance. The experiment compared three variations of the character guide: 1) no guide; 2) a guide with an art style similar to the style of the video design; and 3) a character guide with a dissimilar style. All participants viewed the narrative experiences to test whether a similar art style led to better gaze behaviors that had higher likelihood of falling on the intended focus regions of the 360-degree range of the Virtual Environment (VE). By the end of the experiment, we concluded that adding a virtual character that was independent from the narrative had limited effects on users’ gaze performances when watching an interactive story in VR. Furthermore, the implemented character’s art style made very few difference to users’ gaze performance as well as their level of viewing satisfaction. The primary reason could be due to limitation of the implementation design. Besides this, the guiding body language designed for an animal character caused certain confusion for numerous participants viewing the stories. In the end, the character guide approaches still provided insights for future directors and designers into how to draw the viewers’ attention to a target point within a narrative VE, including what can work well and what should be avoide
ReDirTrans: Latent-to-Latent Translation for Gaze and Head Redirection
Learning-based gaze estimation methods require large amounts of training data
with accurate gaze annotations. Facing such demanding requirements of gaze data
collection and annotation, several image synthesis methods were proposed, which
successfully redirected gaze directions precisely given the assigned
conditions. However, these methods focused on changing gaze directions of the
images that only include eyes or restricted ranges of faces with low resolution
(less than ) to largely reduce interference from other attributes
such as hairs, which limits application scenarios. To cope with this
limitation, we proposed a portable network, called ReDirTrans, achieving
latent-to-latent translation for redirecting gaze directions and head
orientations in an interpretable manner. ReDirTrans projects input latent
vectors into aimed-attribute embeddings only and redirects these embeddings
with assigned pitch and yaw values. Then both the initial and edited embeddings
are projected back (deprojected) to the initial latent space as residuals to
modify the input latent vectors by subtraction and addition, representing old
status removal and new status addition. The projection of aimed attributes only
and subtraction-addition operations for status replacement essentially mitigate
impacts on other attributes and the distribution of latent vectors. Thus, by
combining ReDirTrans with a pretrained fixed e4e-StyleGAN pair, we created
ReDirTrans-GAN, which enables accurately redirecting gaze in full-face images
with resolution while preserving other attributes such as
identity, expression, and hairstyle. Furthermore, we presented improvements for
the downstream learning-based gaze estimation task, using redirected samples as
dataset augmentation
- …