247 research outputs found
Real-time cartoon-like stylization of AR video streams on the GPU
The ultimate goal of many applications of augmented reality is to immerse the user into the augmented scene, which is enriched with virtual models. In order to achieve this immersion, it is necessary to create the visual impression that the graphical objects are a natural part of the user’s environment. Producing this effect with conventional computer graphics algorithms is a complex task. Various rendering artifacts in the three-dimensional graphics create a noticeable visual discrepancy between the real background image and virtual objects.
We have recently proposed a novel approach to generating an augmented video stream. With this new method, the output images are a non-photorealistic reproduction of the augmented environment. Special stylization methods are applied to both the background camera image and the virtual objects. This way the visual realism of both the graphical foreground and the real background image is reduced, so that they are less distinguishable from each other.
Here, we present a new method for the cartoon-like stylization of augmented reality images, which uses a novel post-processing filter for cartoon-like color segmentation and high-contrast silhouettes. In order to make a fast postprocessing of rendered images possible, the programmability of modern graphics hardware is exploited. We describe an implementation of the algorithm using the OpenGL Shading Language. The system is capable of generating a stylized augmented video stream of high visual quality at real-time frame rates. As an example application, we demonstrate the visualization of dinosaur bone datasets in stylized augmented reality
Learning to Incorporate Texture Saliency Adaptive Attention to Image Cartoonization
Image cartoonization is recently dominated by generative adversarial networks
(GANs) from the perspective of unsupervised image-to-image translation, in
which an inherent challenge is to precisely capture and sufficiently transfer
characteristic cartoon styles (e.g., clear edges, smooth color shading,
abstract fine structures, etc.). Existing advanced models try to enhance
cartoonization effect by learning to promote edges adversarially, introducing
style transfer loss, or learning to align style from multiple representation
space. This paper demonstrates that more distinct and vivid cartoonization
effect could be easily achieved with only basic adversarial loss. Observing
that cartoon style is more evident in cartoon-texture-salient local image
regions, we build a region-level adversarial learning branch in parallel with
the normal image-level one, which constrains adversarial learning on
cartoon-texture-salient local patches for better perceiving and transferring
cartoon texture features. To this end, a novel cartoon-texture-saliency-sampler
(CTSS) module is proposed to dynamically sample cartoon-texture-salient patches
from training data. With extensive experiments, we demonstrate that texture
saliency adaptive attention in adversarial learning, as a missing ingredient of
related methods in image cartoonization, is of significant importance in
facilitating and enhancing image cartoon stylization, especially for
high-resolution input pictures.Comment: Proceedings of the 39th International Conference on Machine Learning,
PMLR 162:7183-7207, 202
Deep Video Color Propagation
Traditional approaches for color propagation in videos rely on some form of
matching between consecutive video frames. Using appearance descriptors, colors
are then propagated both spatially and temporally. These methods, however, are
computationally expensive and do not take advantage of semantic information of
the scene. In this work we propose a deep learning framework for color
propagation that combines a local strategy, to propagate colors frame-by-frame
ensuring temporal stability, and a global strategy, using semantics for color
propagation within a longer range. Our evaluation shows the superiority of our
strategy over existing video and image color propagation methods as well as
neural photo-realistic style transfer approaches.Comment: BMVC 201
Realtime Fewshot Portrait Stylization Based On Geometric Alignment
This paper presents a portrait stylization method designed for real-time
mobile applications with limited style examples available. Previous learning
based stylization methods suffer from the geometric and semantic gaps between
portrait domain and style domain, which obstacles the style information to be
correctly transferred to the portrait images, leading to poor stylization
quality. Based on the geometric prior of human facial attributions, we propose
to utilize geometric alignment to tackle this issue. Firstly, we apply
Thin-Plate-Spline (TPS) on feature maps in the generator network and also
directly to style images in pixel space, generating aligned portrait-style
image pairs with identical landmarks, which closes the geometric gaps between
two domains. Secondly, adversarial learning maps the textures and colors of
portrait images to the style domain. Finally, geometric aware cycle consistency
preserves the content and identity information unchanged, and deformation
invariant constraint suppresses artifacts and distortions. Qualitative and
quantitative comparison validate our method outperforms existing methods, and
experiments proof our method could be trained with limited style examples (100
or less) in real-time (more than 40 FPS) on mobile devices. Ablation study
demonstrates the effectiveness of each component in the framework.Comment: 10 pages, 10 figure
Scenimefy: Learning to Craft Anime Scene via Semi-Supervised Image-to-Image Translation
Automatic high-quality rendering of anime scenes from complex real-world
images is of significant practical value. The challenges of this task lie in
the complexity of the scenes, the unique features of anime style, and the lack
of high-quality datasets to bridge the domain gap. Despite promising attempts,
previous efforts are still incompetent in achieving satisfactory results with
consistent semantic preservation, evident stylization, and fine details. In
this study, we propose Scenimefy, a novel semi-supervised image-to-image
translation framework that addresses these challenges. Our approach guides the
learning with structure-consistent pseudo paired data, simplifying the pure
unsupervised setting. The pseudo data are derived uniquely from a
semantic-constrained StyleGAN leveraging rich model priors like CLIP. We
further apply segmentation-guided data selection to obtain high-quality pseudo
supervision. A patch-wise contrastive style loss is introduced to improve
stylization and fine details. Besides, we contribute a high-resolution anime
scene dataset to facilitate future research. Our extensive experiments
demonstrate the superiority of our method over state-of-the-art baselines in
terms of both perceptual quality and quantitative performance.Comment: ICCV 2023. The first two authors contributed equally. Code:
https://github.com/Yuxinn-J/Scenimefy Project page:
https://yuxinn-j.github.io/projects/Scenimefy.htm
Face-PAST: Facial Pose Awareness and Style Transfer Networks
Facial style transfer has been quite popular among researchers due to the
rise of emerging technologies such as eXtended Reality (XR), Metaverse, and
Non-Fungible Tokens (NFTs). Furthermore, StyleGAN methods along with
transfer-learning strategies have reduced the problem of limited data to some
extent. However, most of the StyleGAN methods overfit the styles while adding
artifacts to facial images. In this paper, we propose a facial pose awareness
and style transfer (Face-PAST) network that preserves facial details and
structures while generating high-quality stylized images. Dual StyleGAN
inspires our work, but in contrast, our work uses a pre-trained style
generation network in an external style pass with a residual modulation block
instead of a transform coding block. Furthermore, we use the gated mapping unit
and facial structure, identity, and segmentation losses to preserve the facial
structure and details. This enables us to train the network with a very limited
amount of data while generating high-quality stylized images. Our training
process adapts curriculum learning strategy to perform efficient and flexible
style mixing in the generative space. We perform extensive experiments to show
the superiority of Face-PAST in comparison to existing state-of-the-art
methods.Comment: 20 pages, 8 figures, 2 table
VToonify: Controllable High-Resolution Portrait Video Style Transfer
Generating high-quality artistic portrait videos is an important and
desirable task in computer graphics and vision. Although a series of successful
portrait image toonification models built upon the powerful StyleGAN have been
proposed, these image-oriented methods have obvious limitations when applied to
videos, such as the fixed frame size, the requirement of face alignment,
missing non-facial details and temporal inconsistency. In this work, we
investigate the challenging controllable high-resolution portrait video style
transfer by introducing a novel VToonify framework. Specifically, VToonify
leverages the mid- and high-resolution layers of StyleGAN to render
high-quality artistic portraits based on the multi-scale content features
extracted by an encoder to better preserve the frame details. The resulting
fully convolutional architecture accepts non-aligned faces in videos of
variable size as input, contributing to complete face regions with natural
motions in the output. Our framework is compatible with existing StyleGAN-based
image toonification models to extend them to video toonification, and inherits
appealing features of these models for flexible style control on color and
intensity. This work presents two instantiations of VToonify built upon Toonify
and DualStyleGAN for collection-based and exemplar-based portrait video style
transfer, respectively. Extensive experimental results demonstrate the
effectiveness of our proposed VToonify framework over existing methods in
generating high-quality and temporally-coherent artistic portrait videos with
flexible style controls.Comment: ACM Transactions on Graphics (SIGGRAPH Asia 2022). Code:
https://github.com/williamyang1991/VToonify Project page:
https://www.mmlab-ntu.com/project/vtoonify
- …