20 research outputs found
Attention-aware Multi-stroke Style Transfer
Neural style transfer has drawn considerable attention from both academic and industrial field. Although visual effect and efficiency have been significantly improved, existing methods are unable to coordinate spatial distribution of visual attention between the content image and stylized image, or render diverse level of detail via different brush strokes. In this paper, we tackle these limitations by developing an attention-aware multi-stroke style transfer model. We first propose to assemble self-attention mechanism into a style-agnostic reconstruction autoencoder framework, from which the attention map of a content image can be derived. By performing multi-scale style swap on content features and style features, we produce multiple feature maps reflecting different stroke patterns. A flexible fusion strategy is further presented to incorporate the salient characteristics from the attention map, which allows integrating multiple stroke patterns into different spatial regions of the output image harmoniously. We demonstrate the effectiveness of our method, as well as generate comparable stylized images with multiple stroke patterns against the state-of-the-art methods
Arbitrary Video Style Transfer via Multi-Channel Correlation
Video style transfer is getting more attention in AI community for its
numerous applications such as augmented reality and animation productions.
Compared with traditional image style transfer, performing this task on video
presents new challenges: how to effectively generate satisfactory stylized
results for any specified style, and maintain temporal coherence across frames
at the same time. Towards this end, we propose Multi-Channel Correction network
(MCCNet), which can be trained to fuse the exemplar style features and input
content features for efficient style transfer while naturally maintaining the
coherence of input videos. Specifically, MCCNet works directly on the feature
space of style and content domain where it learns to rearrange and fuse style
features based on their similarity with content features. The outputs generated
by MCC are features containing the desired style patterns which can further be
decoded into images with vivid style textures. Moreover, MCCNet is also
designed to explicitly align the features to input which ensures the output
maintains the content structures as well as the temporal continuity. To further
improve the performance of MCCNet under complex light conditions, we also
introduce the illumination loss during training. Qualitative and quantitative
evaluations demonstrate that MCCNet performs well in both arbitrary video and
image style transfer tasks
Instant Neural Radiance Fields Stylization
We present Instant Neural Radiance Fields Stylization, a novel approach for
multi-view image stylization for the 3D scene. Our approach models a neural
radiance field based on neural graphics primitives, which use a hash
table-based position encoder for position embedding. We split the position
encoder into two parts, the content and style sub-branches, and train the
network for normal novel view image synthesis with the content and style
targets. In the inference stage, we execute AdaIN to the output features of the
position encoder, with content and style voxel grid features as reference. With
the adjusted features, the stylization of novel view images could be obtained.
Our method extends the style target from style images to image sets of scenes
and does not require additional network training for stylization. Given a set
of images of 3D scenes and a style target(a style image or another set of 3D
scenes), our method can generate stylized novel views with a consistent
appearance at various view angles in less than 10 minutes on modern GPU
hardware. Extensive experimental results demonstrate the validity and
superiority of our method
Retinex-guided Channel-grouping based Patch Swap for Arbitrary Style Transfer
The basic principle of the patch-matching based style transfer is to
substitute the patches of the content image feature maps by the closest patches
from the style image feature maps. Since the finite features harvested from one
single aesthetic style image are inadequate to represent the rich textures of
the content natural image, existing techniques treat the full-channel style
feature patches as simple signal tensors and create new style feature patches
via signal-level fusion, which ignore the implicit diversities existed in style
features and thus fail for generating better stylised results. In this paper,
we propose a Retinex theory guided, channel-grouping based patch swap technique
to solve the above challenges. Channel-grouping strategy groups the style
feature maps into surface and texture channels, which prevents the
winner-takes-all problem. Retinex theory based decomposition controls a more
stable channel code rate generation. In addition, we provide complementary
fusion and multi-scale generation strategy to prevent unexpected black area and
over-stylised results respectively. Experimental results demonstrate that the
proposed method outperforms the existing techniques in providing more
style-consistent textures while keeping the content fidelity