169 research outputs found
HyperStyle3D: Text-Guided 3D Portrait Stylization via Hypernetworks
Portrait stylization is a long-standing task enabling extensive applications.
Although 2D-based methods have made great progress in recent years, real-world
applications such as metaverse and games often demand 3D content. On the other
hand, the requirement of 3D data, which is costly to acquire, significantly
impedes the development of 3D portrait stylization methods. In this paper,
inspired by the success of 3D-aware GANs that bridge 2D and 3D domains with 3D
fields as the intermediate representation for rendering 2D images, we propose a
novel method, dubbed HyperStyle3D, based on 3D-aware GANs for 3D portrait
stylization. At the core of our method is a hyper-network learned to manipulate
the parameters of the generator in a single forward pass. It not only offers a
strong capacity to handle multiple styles with a single model, but also enables
flexible fine-grained stylization that affects only texture, shape, or local
part of the portrait. While the use of 3D-aware GANs bypasses the requirement
of 3D data, we further alleviate the necessity of style images with the CLIP
model being the stylization guidance. We conduct an extensive set of
experiments across the style, attribute, and shape, and meanwhile, measure the
3D consistency. These experiments demonstrate the superior capability of our
HyperStyle3D model in rendering 3D-consistent images in diverse styles,
deforming the face shape, and editing various attributes
Realtime Fewshot Portrait Stylization Based On Geometric Alignment
This paper presents a portrait stylization method designed for real-time
mobile applications with limited style examples available. Previous learning
based stylization methods suffer from the geometric and semantic gaps between
portrait domain and style domain, which obstacles the style information to be
correctly transferred to the portrait images, leading to poor stylization
quality. Based on the geometric prior of human facial attributions, we propose
to utilize geometric alignment to tackle this issue. Firstly, we apply
Thin-Plate-Spline (TPS) on feature maps in the generator network and also
directly to style images in pixel space, generating aligned portrait-style
image pairs with identical landmarks, which closes the geometric gaps between
two domains. Secondly, adversarial learning maps the textures and colors of
portrait images to the style domain. Finally, geometric aware cycle consistency
preserves the content and identity information unchanged, and deformation
invariant constraint suppresses artifacts and distortions. Qualitative and
quantitative comparison validate our method outperforms existing methods, and
experiments proof our method could be trained with limited style examples (100
or less) in real-time (more than 40 FPS) on mobile devices. Ablation study
demonstrates the effectiveness of each component in the framework.Comment: 10 pages, 10 figure
DeformToon3D: Deformable 3D Toonification from Neural Radiance Fields
In this paper, we address the challenging problem of 3D toonification, which
involves transferring the style of an artistic domain onto a target 3D face
with stylized geometry and texture. Although fine-tuning a pre-trained 3D GAN
on the artistic domain can produce reasonable performance, this strategy has
limitations in the 3D domain. In particular, fine-tuning can deteriorate the
original GAN latent space, which affects subsequent semantic editing, and
requires independent optimization and storage for each new style, limiting
flexibility and efficient deployment. To overcome these challenges, we propose
DeformToon3D, an effective toonification framework tailored for hierarchical 3D
GAN. Our approach decomposes 3D toonification into subproblems of geometry and
texture stylization to better preserve the original latent space. Specifically,
we devise a novel StyleField that predicts conditional 3D deformation to align
a real-space NeRF to the style space for geometry stylization. Thanks to the
StyleField formulation, which already handles geometry stylization well,
texture stylization can be achieved conveniently via adaptive style mixing that
injects information of the artistic domain into the decoder of the pre-trained
3D GAN. Due to the unique design, our method enables flexible style degree
control and shape-texture-specific style swap. Furthermore, we achieve
efficient training without any real-world 2D-3D training pairs but proxy
samples synthesized from off-the-shelf 2D toonification models.Comment: ICCV 2023. Code: https://github.com/junzhezhang/DeformToon3D Project
page: https://www.mmlab-ntu.com/project/deformtoon3d
VToonify: Controllable High-Resolution Portrait Video Style Transfer
Generating high-quality artistic portrait videos is an important and
desirable task in computer graphics and vision. Although a series of successful
portrait image toonification models built upon the powerful StyleGAN have been
proposed, these image-oriented methods have obvious limitations when applied to
videos, such as the fixed frame size, the requirement of face alignment,
missing non-facial details and temporal inconsistency. In this work, we
investigate the challenging controllable high-resolution portrait video style
transfer by introducing a novel VToonify framework. Specifically, VToonify
leverages the mid- and high-resolution layers of StyleGAN to render
high-quality artistic portraits based on the multi-scale content features
extracted by an encoder to better preserve the frame details. The resulting
fully convolutional architecture accepts non-aligned faces in videos of
variable size as input, contributing to complete face regions with natural
motions in the output. Our framework is compatible with existing StyleGAN-based
image toonification models to extend them to video toonification, and inherits
appealing features of these models for flexible style control on color and
intensity. This work presents two instantiations of VToonify built upon Toonify
and DualStyleGAN for collection-based and exemplar-based portrait video style
transfer, respectively. Extensive experimental results demonstrate the
effectiveness of our proposed VToonify framework over existing methods in
generating high-quality and temporally-coherent artistic portrait videos with
flexible style controls.Comment: ACM Transactions on Graphics (SIGGRAPH Asia 2022). Code:
https://github.com/williamyang1991/VToonify Project page:
https://www.mmlab-ntu.com/project/vtoonify
CartoonGAN: generative adversarial networks for photo cartoonization
In this paper, we propose a solution to transforming photos
of real-world scenes into cartoon style images, which is
valuable and challenging in computer vision and computer
graphics. Our solution belongs to learning based methods,
which have recently become popular to stylize images in
artistic forms such as painting. However, existing methods
do not produce satisfactory results for cartoonization,
due to the fact that (1) cartoon styles have unique characteristics
with high level simplification and abstraction, and
(2) cartoon images tend to have clear edges, smooth color
shading and relatively simple textures, which exhibit significant
challenges for texture-descriptor-based loss functions
used in existing methods. In this paper, we propose CartoonGAN,
a generative adversarial network (GAN) framework
for cartoon stylization. Our method takes unpaired
photos and cartoon images for training, which is easy to
use. Two novel losses suitable for cartoonization are proposed:
(1) a semantic content loss, which is formulated as
a sparse regularization in the high-level feature maps of
the VGG network to cope with substantial style variation
between photos and cartoons, and (2) an edge-promoting
adversarial loss for preserving clear edges. We further introduce
an initialization phase, to improve the convergence
of the network to the target manifold. Our method is also
much more efficient to train than existing methods. Experimental
results show that our method is able to generate
high-quality cartoon images from real-world photos (i.e.,
following specific artists’ styles and with clear edges and
smooth shading) and outperforms state-of-the-art methods
- …