88 research outputs found
GIF: Generative Interpretable Faces
Photo-realistic visualization and animation of expressive human faces have
been a long standing challenge. 3D face modeling methods provide parametric
control but generates unrealistic images, on the other hand, generative 2D
models like GANs (Generative Adversarial Networks) output photo-realistic face
images, but lack explicit control. Recent methods gain partial control, either
by attempting to disentangle different factors in an unsupervised manner, or by
adding control post hoc to a pre-trained model. Unconditional GANs, however,
may entangle factors that are hard to undo later. We condition our generative
model on pre-defined control parameters to encourage disentanglement in the
generation process. Specifically, we condition StyleGAN2 on FLAME, a generative
3D face model. While conditioning on FLAME parameters yields unsatisfactory
results, we find that conditioning on rendered FLAME geometry and photometric
details works well. This gives us a generative 2D face model named GIF
(Generative Interpretable Faces) that offers FLAME's parametric control. Here,
interpretable refers to the semantic meaning of different parameters. Given
FLAME parameters for shape, pose, expressions, parameters for appearance,
lighting, and an additional style vector, GIF outputs photo-realistic face
images. We perform an AMT based perceptual study to quantitatively and
qualitatively evaluate how well GIF follows its conditioning. The code, data,
and trained model are publicly available for research purposes at
http://gif.is.tue.mpg.de.Comment: International Conference on 3D Vision (3DV) 202
A Visual Computing Unified Application Using Deep Learning and Computer Vision Techniques
Vision Studio aims to utilize a diverse range of modern deep learning and computer vision principles and techniques to provide a broad array of functionalities in image and video processing. Deep learning is a distinct class of machine learning algorithms that utilize multiple layers to gradually extract more advanced features from raw input. This is beneficial when using a matrix as input for pixels in a photo or frames in a video. Computer vision is a field of artificial intelligence that teaches computers to interpret and comprehend the visual domain. The main functions implemented include deepfake creation, digital ageing (de-ageing), image animation, and deepfake detection. Deepfake creation allows users to utilize deep learning methods, particularly autoencoders, to overlay source images onto a target video. This creates a video of the source person imitating or saying things that the target person does. Digital aging utilizes generative adversarial networks (GANs) to digitally simulate the aging process of an individual. Image animation utilizes first-order motion models to create highly realistic animations from a source image and driving video. Deepfake detection is achieved by using advanced and highly efficient convolutional neural networks (CNNs), primarily employing the EfficientNet family of models
- …