107 research outputs found
ICface: Interpretable and Controllable Face Reenactment Using GANs
This paper presents a generic face animator that is able to control the pose
and expressions of a given face image. The animation is driven by human
interpretable control signals consisting of head pose angles and the Action
Unit (AU) values. The control information can be obtained from multiple sources
including external driving videos and manual controls. Due to the interpretable
nature of the driving signal, one can easily mix the information between
multiple sources (e.g. pose from one image and expression from another) and
apply selective post-production editing. The proposed face animator is
implemented as a two-stage neural network model that is learned in a
self-supervised manner using a large video collection. The proposed
Interpretable and Controllable face reenactment network (ICface) is compared to
the state-of-the-art neural network-based face animation techniques in multiple
tasks. The results indicate that ICface produces better visual quality while
being more versatile than most of the comparison methods. The introduced model
could provide a lightweight and easy to use tool for a multitude of advanced
image and video editing tasks.Comment: Accepted in WACV-202
Dynamic Neural Portraits
We present Dynamic Neural Portraits, a novel approach to the problem of
full-head reenactment. Our method generates photo-realistic video portraits by
explicitly controlling head pose, facial expressions and eye gaze. Our proposed
architecture is different from existing methods that rely on GAN-based
image-to-image translation networks for transforming renderings of 3D faces
into photo-realistic images. Instead, we build our system upon a 2D
coordinate-based MLP with controllable dynamics. Our intuition to adopt a
2D-based representation, as opposed to recent 3D NeRF-like systems, stems from
the fact that video portraits are captured by monocular stationary cameras,
therefore, only a single viewpoint of the scene is available. Primarily, we
condition our generative model on expression blendshapes, nonetheless, we show
that our system can be successfully driven by audio features as well. Our
experiments demonstrate that the proposed method is 270 times faster than
recent NeRF-based reenactment methods, with our networks achieving speeds of 24
fps for resolutions up to 1024 x 1024, while outperforming prior works in terms
of visual quality.Comment: In IEEE/CVF Winter Conference on Applications of Computer Vision
(WACV) 202
Deepfakes Generation using LSTM based Generative Adversarial Networks
Deep learning has been achieving promising results across a wide range of complex task domains. However, recent advancements in deep learning have also been employed to create software which causes threats to the privacy of people and national security. One among them is deepfakes, which creates fake images as well as videos that cannot be detected as forgeries by humans. Fake speeches of world leaders can even cause threat to world stability and peace. Apart from the malicious usage, deepfakes can also be used for positive purposes such as in films for post dubbing or performing language translation. This latter case was recently used in the latest Indian election such that politician speeches can be converted to many Indian dialects across the country. This work was traditionally done using computer graphic technology and 3D models. But with advances in deep learning and computer vision, in particular GANs, the earlier methods are being replaced by deep learning methods. This research will focus on using deep neural networks for generating manipulated faces in images and videos.
This master’s thesis develops a novel architecture which can generate a full sequence of video frames given a source image and a target video. We were inspired by the works done by NVIDIA in vid2vid and few-shot vid2vid where they learn to map source video domains to target domains. In our work, we propose a unified model using LSTM based GANs along with a motion module which uses a keypoint detector to generate the dense motion. The generator network employs warping to combine the appearance extracted from the source image and the motion from the target video to generate realistic videos and also to decouple the occlusions. The training is done end-to-end and the keypoints are learnt in a self-supervised way. Evaluation is demonstrated on the recently introduced FaceForensics++ and VoxCeleb datasets
High-fidelity Facial Avatar Reconstruction from Monocular Video with Generative Priors
High-fidelity facial avatar reconstruction from a monocular video is a
significant research problem in computer graphics and computer vision.
Recently, Neural Radiance Field (NeRF) has shown impressive novel view
rendering results and has been considered for facial avatar reconstruction.
However, the complex facial dynamics and missing 3D information in monocular
videos raise significant challenges for faithful facial reconstruction. In this
work, we propose a new method for NeRF-based facial avatar reconstruction that
utilizes 3D-aware generative prior. Different from existing works that depend
on a conditional deformation field for dynamic modeling, we propose to learn a
personalized generative prior, which is formulated as a local and low
dimensional subspace in the latent space of 3D-GAN. We propose an efficient
method to construct the personalized generative prior based on a small set of
facial images of a given individual. After learning, it allows for
photo-realistic rendering with novel views and the face reenactment can be
realized by performing navigation in the latent space. Our proposed method is
applicable for different driven signals, including RGB images, 3DMM
coefficients, and audios. Compared with existing works, we obtain superior
novel view synthesis results and faithfully face reenactment performance.Comment: 8 pages, 7 figure
- …