1,031 research outputs found
SEAN: Image Synthesis with Semantic Region-Adaptive Normalization
We propose semantic region-adaptive normalization (SEAN), a simple but
effective building block for Generative Adversarial Networks conditioned on
segmentation masks that describe the semantic regions in the desired output
image. Using SEAN normalization, we can build a network architecture that can
control the style of each semantic region individually, e.g., we can specify
one style reference image per region. SEAN is better suited to encode,
transfer, and synthesize style than the best previous method in terms of
reconstruction quality, variability, and visual quality. We evaluate SEAN on
multiple datasets and report better quantitative metrics (e.g. FID, PSNR) than
the current state of the art. SEAN also pushes the frontier of interactive
image editing. We can interactively edit images by changing segmentation masks
or the style for any given region. We can also interpolate styles from two
reference images per region.Comment: Accepted as a CVPR 2020 oral paper. The interactive demo is available
at https://youtu.be/0Vbj9xFgoU
Next3D: Generative Neural Texture Rasterization for 3D-Aware Head Avatars
3D-aware generative adversarial networks (GANs) synthesize high-fidelity and
multi-view-consistent facial images using only collections of single-view 2D
imagery. Towards fine-grained control over facial attributes, recent efforts
incorporate 3D Morphable Face Model (3DMM) to describe deformation in
generative radiance fields either explicitly or implicitly. Explicit methods
provide fine-grained expression control but cannot handle topological changes
caused by hair and accessories, while implicit ones can model varied topologies
but have limited generalization caused by the unconstrained deformation fields.
We propose a novel 3D GAN framework for unsupervised learning of generative,
high-quality and 3D-consistent facial avatars from unstructured 2D images. To
achieve both deformation accuracy and topological flexibility, we propose a 3D
representation called Generative Texture-Rasterized Tri-planes. The proposed
representation learns Generative Neural Textures on top of parametric mesh
templates and then projects them into three orthogonal-viewed feature planes
through rasterization, forming a tri-plane feature representation for volume
rendering. In this way, we combine both fine-grained expression control of
mesh-guided explicit deformation and the flexibility of implicit volumetric
representation. We further propose specific modules for modeling mouth interior
which is not taken into account by 3DMM. Our method demonstrates
state-of-the-art 3D-aware synthesis quality and animation ability through
extensive experiments. Furthermore, serving as 3D prior, our animatable 3D
representation boosts multiple applications including one-shot facial avatars
and 3D-aware stylization.Comment: Project page: https://mrtornado24.github.io/Next3D
State of the Art on Neural Rendering
Efficient rendering of photo-realistic virtual worlds is a long standing effort of computer graphics. Modern graphics techniques have succeeded in synthesizing photo-realistic images from hand-crafted scene representations. However, the automatic generation of shape, materials, lighting, and other aspects of scenes remains a challenging problem that, if solved, would make photo-realistic computer graphics more widely accessible. Concurrently, progress in computer vision and machine learning have given rise to a new approach to image synthesis and editing, namely deep generative models. Neural rendering is a new and rapidly emerging field that combines generative machine learning techniques with physical knowledge from computer graphics, e.g., by the integration of differentiable rendering into network training. With a plethora of applications in computer graphics and vision, neural rendering is poised to become a new area in the graphics community, yet no survey of this emerging field exists. This state-of-the-art report summarizes the recent trends and applications of neural rendering. We focus on approaches that combine classic computer graphics techniques with deep generative models to obtain controllable and photo-realistic outputs. Starting with an overview of the underlying computer graphics and machine learning concepts, we discuss critical aspects of neural rendering approaches. This state-of-the-art report is focused on the many important use cases for the described algorithms such as novel view synthesis, semantic photo manipulation, facial and body reenactment, relighting, free-viewpoint video, and the creation of photo-realistic avatars for virtual and augmented reality telepresence. Finally, we conclude with a discussion of the social implications of such technology and investigate open research problems
AI-generated Content for Various Data Modalities: A Survey
AI-generated content (AIGC) methods aim to produce text, images, videos, 3D
assets, and other media using AI algorithms. Due to its wide range of
applications and the demonstrated potential of recent works, AIGC developments
have been attracting lots of attention recently, and AIGC methods have been
developed for various data modalities, such as image, video, text, 3D shape (as
voxels, point clouds, meshes, and neural implicit fields), 3D scene, 3D human
avatar (body and head), 3D motion, and audio -- each presenting different
characteristics and challenges. Furthermore, there have also been many
significant developments in cross-modality AIGC methods, where generative
methods can receive conditioning input in one modality and produce outputs in
another. Examples include going from various modalities to image, video, 3D
shape, 3D scene, 3D avatar (body and head), 3D motion (skeleton and avatar),
and audio modalities. In this paper, we provide a comprehensive review of AIGC
methods across different data modalities, including both single-modality and
cross-modality methods, highlighting the various challenges, representative
works, and recent technical directions in each setting. We also survey the
representative datasets throughout the modalities, and present comparative
results for various modalities. Moreover, we also discuss the challenges and
potential future research directions
- …