1,549 research outputs found
ICface: Interpretable and Controllable Face Reenactment Using GANs
This paper presents a generic face animator that is able to control the pose
and expressions of a given face image. The animation is driven by human
interpretable control signals consisting of head pose angles and the Action
Unit (AU) values. The control information can be obtained from multiple sources
including external driving videos and manual controls. Due to the interpretable
nature of the driving signal, one can easily mix the information between
multiple sources (e.g. pose from one image and expression from another) and
apply selective post-production editing. The proposed face animator is
implemented as a two-stage neural network model that is learned in a
self-supervised manner using a large video collection. The proposed
Interpretable and Controllable face reenactment network (ICface) is compared to
the state-of-the-art neural network-based face animation techniques in multiple
tasks. The results indicate that ICface produces better visual quality while
being more versatile than most of the comparison methods. The introduced model
could provide a lightweight and easy to use tool for a multitude of advanced
image and video editing tasks.Comment: Accepted in WACV-202
Recovering Faces from Portraits with Auxiliary Facial Attributes
Recovering a photorealistic face from an artistic portrait is a challenging
task since crucial facial details are often distorted or completely lost in
artistic compositions. To handle this loss, we propose an Attribute-guided Face
Recovery from Portraits (AFRP) that utilizes a Face Recovery Network (FRN) and
a Discriminative Network (DN). FRN consists of an autoencoder with residual
block-embedded skip-connections and incorporates facial attribute vectors into
the feature maps of input portraits at the bottleneck of the autoencoder. DN
has multiple convolutional and fully-connected layers, and its role is to
enforce FRN to generate authentic face images with corresponding facial
attributes dictated by the input attribute vectors. %Leveraging on the spatial
transformer networks, FRN automatically compensates for misalignments of
portraits. % and generates aligned face images. For the preservation of
identities, we impose the recovered and ground-truth faces to share similar
visual features. Specifically, DN determines whether the recovered image looks
like a real face and checks if the facial attributes extracted from the
recovered image are consistent with given attributes. %Our method can recover
high-quality photorealistic faces from unaligned portraits while preserving the
identity of the face images as well as it can reconstruct a photorealistic face
image with a desired set of attributes. Our method can recover photorealistic
identity-preserving faces with desired attributes from unseen stylized
portraits, artistic paintings, and hand-drawn sketches. On large-scale
synthesized and sketch datasets, we demonstrate that our face recovery method
achieves state-of-the-art results.Comment: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV
Text-based Editing of Talking-head Video
Editing talking-head video to change the speech content or to remove filler words is challenging. We propose a novel method to edit talking-head video based on its transcript to produce a realistic output video in which the dialogue of the speaker has been modified, while maintaining a seamless audio-visual flow (i.e. no jump cuts). Our method automatically annotates an input talking-head video with phonemes, visemes, 3D face pose and geometry, reflectance, expression and scene illumination per frame. To edit a video, the user has to only edit the transcript, and an optimization strategy then chooses segments of the input corpus as base material. The annotated parameters corresponding to the selected segments are seamlessly stitched together and used to produce an intermediate video representation in which the lower half of the face is rendered with a parametric face model. Finally, a recurrent video generation network transforms this representation to a photorealistic video that matches the edited transcript. We demonstrate a large variety of edits, such as the addition, removal, and alteration of words, as well as convincing language translation and full sentence synthesis
PhotoApp: Photorealistic Appearance Editing of Head Portraits
Photorealistic editing of portraits is a challenging task as humans are very
sensitive to inconsistencies in faces. We present an approach for high-quality
intuitive editing of the camera viewpoint and scene illumination in a portrait
image. This requires our method to capture and control the full reflectance
field of the person in the image. Most editing approaches rely on supervised
learning using training data captured with setups such as light and camera
stages. Such datasets are expensive to acquire, not readily available and do
not capture all the rich variations of in-the-wild portrait images. In
addition, most supervised approaches only focus on relighting, and do not allow
camera viewpoint editing. Thus, they only capture and control a subset of the
reflectance field. Recently, portrait editing has been demonstrated by
operating in the generative model space of StyleGAN. While such approaches do
not require direct supervision, there is a significant loss of quality when
compared to the supervised approaches. In this paper, we present a method which
learns from limited supervised training data. The training images only include
people in a fixed neutral expression with eyes closed, without much hair or
background variations. Each person is captured under 150 one-light-at-a-time
conditions and under 8 camera poses. Instead of training directly in the image
space, we design a supervised problem which learns transformations in the
latent space of StyleGAN. This combines the best of supervised learning and
generative adversarial modeling. We show that the StyleGAN prior allows for
generalisation to different expressions, hairstyles and backgrounds. This
produces high-quality photorealistic results for in-the-wild images and
significantly outperforms existing methods. Our approach can edit the
illumination and pose simultaneously, and runs at interactive rates.Comment: http://gvv.mpi-inf.mpg.de/projects/PhotoApp
Deep Reflectance Maps
Undoing the image formation process and therefore decomposing appearance into
its intrinsic properties is a challenging task due to the under-constraint
nature of this inverse problem. While significant progress has been made on
inferring shape, materials and illumination from images only, progress in an
unconstrained setting is still limited. We propose a convolutional neural
architecture to estimate reflectance maps of specular materials in natural
lighting conditions. We achieve this in an end-to-end learning formulation that
directly predicts a reflectance map from the image itself. We show how to
improve estimates by facilitating additional supervision in an indirect scheme
that first predicts surface orientation and afterwards predicts the reflectance
map by a learning-based sparse data interpolation.
In order to analyze performance on this difficult task, we propose a new
challenge of Specular MAterials on SHapes with complex IllumiNation (SMASHINg)
using both synthetic and real images. Furthermore, we show the application of
our method to a range of image-based editing tasks on real images.Comment: project page: http://homes.esat.kuleuven.be/~krematas/DRM
Transport-Based Neural Style Transfer for Smoke Simulations
Artistically controlling fluids has always been a challenging task.
Optimization techniques rely on approximating simulation states towards target
velocity or density field configurations, which are often handcrafted by
artists to indirectly control smoke dynamics. Patch synthesis techniques
transfer image textures or simulation features to a target flow field. However,
these are either limited to adding structural patterns or augmenting coarse
flows with turbulent structures, and hence cannot capture the full spectrum of
different styles and semantically complex structures. In this paper, we propose
the first Transport-based Neural Style Transfer (TNST) algorithm for volumetric
smoke data. Our method is able to transfer features from natural images to
smoke simulations, enabling general content-aware manipulations ranging from
simple patterns to intricate motifs. The proposed algorithm is physically
inspired, since it computes the density transport from a source input smoke to
a desired target configuration. Our transport-based approach allows direct
control over the divergence of the stylization velocity field by optimizing
incompressible and irrotational potentials that transport smoke towards
stylization. Temporal consistency is ensured by transporting and aligning
subsequent stylized velocities, and 3D reconstructions are computed by
seamlessly merging stylizations from different camera viewpoints.Comment: ACM Transaction on Graphics (SIGGRAPH ASIA 2019), additional
materials: http://www.byungsoo.me/project/neural-flow-styl
Neural Radiance Fields: Past, Present, and Future
The various aspects like modeling and interpreting 3D environments and
surroundings have enticed humans to progress their research in 3D Computer
Vision, Computer Graphics, and Machine Learning. An attempt made by Mildenhall
et al in their paper about NeRFs (Neural Radiance Fields) led to a boom in
Computer Graphics, Robotics, Computer Vision, and the possible scope of
High-Resolution Low Storage Augmented Reality and Virtual Reality-based 3D
models have gained traction from res with more than 1000 preprints related to
NeRFs published. This paper serves as a bridge for people starting to study
these fields by building on the basics of Mathematics, Geometry, Computer
Vision, and Computer Graphics to the difficulties encountered in Implicit
Representations at the intersection of all these disciplines. This survey
provides the history of rendering, Implicit Learning, and NeRFs, the
progression of research on NeRFs, and the potential applications and
implications of NeRFs in today's world. In doing so, this survey categorizes
all the NeRF-related research in terms of the datasets used, objective
functions, applications solved, and evaluation criteria for these applications.Comment: 413 pages, 9 figures, 277 citation
- …