394 research outputs found
ICface: Interpretable and Controllable Face Reenactment Using GANs
This paper presents a generic face animator that is able to control the pose
and expressions of a given face image. The animation is driven by human
interpretable control signals consisting of head pose angles and the Action
Unit (AU) values. The control information can be obtained from multiple sources
including external driving videos and manual controls. Due to the interpretable
nature of the driving signal, one can easily mix the information between
multiple sources (e.g. pose from one image and expression from another) and
apply selective post-production editing. The proposed face animator is
implemented as a two-stage neural network model that is learned in a
self-supervised manner using a large video collection. The proposed
Interpretable and Controllable face reenactment network (ICface) is compared to
the state-of-the-art neural network-based face animation techniques in multiple
tasks. The results indicate that ICface produces better visual quality while
being more versatile than most of the comparison methods. The introduced model
could provide a lightweight and easy to use tool for a multitude of advanced
image and video editing tasks.Comment: Accepted in WACV-202
HeadOn: Real-time Reenactment of Human Portrait Videos
We propose HeadOn, the first real-time source-to-target reenactment approach
for complete human portrait videos that enables transfer of torso and head
motion, face expression, and eye gaze. Given a short RGB-D video of the target
actor, we automatically construct a personalized geometry proxy that embeds a
parametric head, eye, and kinematic torso model. A novel real-time reenactment
algorithm employs this proxy to photo-realistically map the captured motion
from the source actor to the target actor. On top of the coarse geometric
proxy, we propose a video-based rendering technique that composites the
modified target portrait video via view- and pose-dependent texturing, and
creates photo-realistic imagery of the target actor under novel torso and head
poses, facial expressions, and gaze directions. To this end, we propose a
robust tracking of the face and torso of the source actor. We extensively
evaluate our approach and show significant improvements in enabling much
greater flexibility in creating realistic reenacted output videos.Comment: Video: https://www.youtube.com/watch?v=7Dg49wv2c_g Presented at
Siggraph'1
AFFECT-PRESERVING VISUAL PRIVACY PROTECTION
The prevalence of wireless networks and the convenience of mobile cameras enable many new video applications other than security and entertainment. From behavioral diagnosis to wellness monitoring, cameras are increasing used for observations in various educational and medical settings. Videos collected for such applications are considered protected health information under privacy laws in many countries. Visual privacy protection techniques, such as blurring or object removal, can be used to mitigate privacy concern, but they also obliterate important visual cues of affect and social behaviors that are crucial for the target applications. In this dissertation, we propose to balance the privacy protection and the utility of the data by preserving the privacy-insensitive information, such as pose and expression, which is useful in many applications involving visual understanding.
The Intellectual Merits of the dissertation include a novel framework for visual privacy protection by manipulating facial image and body shape of individuals, which: (1) is able to conceal the identity of individuals; (2) provide a way to preserve the utility of the data, such as expression and pose information; (3) balance the utility of the data and capacity of the privacy protection.
The Broader Impacts of the dissertation focus on the significance of privacy protection on visual data, and the inadequacy of current privacy enhancing technologies in preserving affect and behavioral attributes of the visual content, which are highly useful for behavior observation in educational and medical settings. This work in this dissertation represents one of the first attempts in achieving both goals simultaneously
EgoFace: Egocentric Face Performance Capture and Videorealistic Reenactment
Face performance capture and reenactment techniques use multiple cameras and sensors, positioned at a distance from the face or mounted on heavy wearable devices. This limits their applications in mobile and outdoor environments. We present EgoFace, a radically new lightweight setup for face performance capture and front-view videorealistic reenactment using a single egocentric RGB camera. Our lightweight setup allows operations in uncontrolled environments, and lends itself to telepresence applications such as video-conferencing from dynamic environments. The input image is projected into a low dimensional latent space of the facial expression parameters. Through careful adversarial training of the parameter-space synthetic rendering, a videorealistic animation is produced. Our problem is challenging as the human visual system is sensitive to the smallest face irregularities that could occur in the final results. This sensitivity is even stronger for video results. Our solution is trained in a pre-processing stage, through a supervised manner without manual annotations. EgoFace captures a wide variety of facial expressions, including mouth movements and asymmetrical expressions. It works under varying illuminations, background, movements, handles people from different ethnicities and can operate in real time
HyperReenact: One-Shot Reenactment via Jointly Learning to Refine and Retarget Faces
In this paper, we present our method for neural face reenactment, called HyperReenact, that aims to generate realistic talking head images of a source identity, driven by a target facial pose. Existing state-of-the-art face reenactment methods train controllable generative models that learn to synthesize realistic facial images, yet producing reenacted faces that are prone to significant visual artifacts, especially under the challenging condition of extreme head pose changes, or requiring expensive few-shot fine-tuning to better preserve the source identity characteristics. We propose to address these limitations by leveraging the photorealistic generation ability and the disentangled properties of a pretrained StyleGAN2 generator, by first inverting the real images into its latent space and then using a hypernetwork to perform:(i) refinement of the source identity characteristics and (ii) facial pose re-targeting, eliminating this way the dependence on external editing methods that typically produce artifacts. Our method operates under the one-shot setting (ie, using a single source frame) and allows for cross-subject reenactment, without requiring any subject-specific fine-tuning. We compare our method both quantitatively and qualitatively against several state-of-the-art techniques on the standard benchmarks of VoxCeleb1 and VoxCeleb2, demonstrating the superiority of our approach in producing artifact-free images, exhibiting remarkable robustness even under extreme head pose changes
HyperReenact: One-Shot Reenactment via Jointly Learning to Refine and Retarget Faces
In this paper, we present our method for neural face reenactment, called
HyperReenact, that aims to generate realistic talking head images of a source
identity, driven by a target facial pose. Existing state-of-the-art face
reenactment methods train controllable generative models that learn to
synthesize realistic facial images, yet producing reenacted faces that are
prone to significant visual artifacts, especially under the challenging
condition of extreme head pose changes, or requiring expensive few-shot
fine-tuning to better preserve the source identity characteristics. We propose
to address these limitations by leveraging the photorealistic generation
ability and the disentangled properties of a pretrained StyleGAN2 generator, by
first inverting the real images into its latent space and then using a
hypernetwork to perform: (i) refinement of the source identity characteristics
and (ii) facial pose re-targeting, eliminating this way the dependence on
external editing methods that typically produce artifacts. Our method operates
under the one-shot setting (i.e., using a single source frame) and allows for
cross-subject reenactment, without requiring any subject-specific fine-tuning.
We compare our method both quantitatively and qualitatively against several
state-of-the-art techniques on the standard benchmarks of VoxCeleb1 and
VoxCeleb2, demonstrating the superiority of our approach in producing
artifact-free images, exhibiting remarkable robustness even under extreme head
pose changes. We make the code and the pretrained models publicly available at:
https://github.com/StelaBou/HyperReenact .Comment: Accepted for publication in ICCV 2023. Project page:
https://stelabou.github.io/hyperreenact.github.io/ Code:
https://github.com/StelaBou/HyperReenac
Aligning Figurative Paintings With Their Sources for Semantic Interpretation
This paper reports steps in probing the artistic methods of figurative painters through computational algorithms. We explore a comparative method that investigates the relation between the source of a painting, typically a photograph or an earlier painting, and the painting itself. A first crucial step in this process is to find the source and to crop, standardize and align it to the painting so that a comparison becomes possible. The next step is to apply different low-level algorithms to construct difference maps for color, edges, texture, brightness, etc. From this basis, various subsequent operations become possible to detect and compare features of the image, such as facial action units and the emotions they signify. This paper demonstrates a pipeline we have built and tested using paintings by a renowned contemporary painter Luc Tuymans. We focus in this paper particularly on the alignment process, on edge difference maps, and on the utility of the comparative method for bringing out the semantic significance of a painting
Text-based Editing of Talking-head Video
Editing talking-head video to change the speech content or to remove filler words is challenging. We propose a novel method to edit talking-head video based on its transcript to produce a realistic output video in which the dialogue of the speaker has been modified, while maintaining a seamless audio-visual flow (i.e. no jump cuts). Our method automatically annotates an input talking-head video with phonemes, visemes, 3D face pose and geometry, reflectance, expression and scene illumination per frame. To edit a video, the user has to only edit the transcript, and an optimization strategy then chooses segments of the input corpus as base material. The annotated parameters corresponding to the selected segments are seamlessly stitched together and used to produce an intermediate video representation in which the lower half of the face is rendered with a parametric face model. Finally, a recurrent video generation network transforms this representation to a photorealistic video that matches the edited transcript. We demonstrate a large variety of edits, such as the addition, removal, and alteration of words, as well as convincing language translation and full sentence synthesis
- …