20 research outputs found
Instant Photorealistic Style Transfer: A Lightweight and Adaptive Approach
In this paper, we propose an Instant Photorealistic Style Transfer (IPST)
approach, designed to achieve instant photorealistic style transfer on
super-resolution inputs without the need for pre-training on pair-wise datasets
or imposing extra constraints. Our method utilizes a lightweight StyleNet to
enable style transfer from a style image to a content image while preserving
non-color information. To further enhance the style transfer process, we
introduce an instance-adaptive optimization to prioritize the photorealism of
outputs and accelerate the convergence of the style network, leading to a rapid
training completion within seconds. Moreover, IPST is well-suited for
multi-frame style transfer tasks, as it retains temporal and multi-view
consistency of the multi-frame inputs such as video and Neural Radiance Field
(NeRF). Experimental results demonstrate that IPST requires less GPU memory
usage, offers faster multi-frame transfer speed, and generates photorealistic
outputs, making it a promising solution for various photorealistic transfer
applications.Comment: 8 pages (reference excluded), 6 figures, 4 table
Two Birds, One Stone: A Unified Framework for Joint Learning of Image and Video Style Transfers
Current arbitrary style transfer models are limited to either image or video
domains. In order to achieve satisfying image and video style transfers, two
different models are inevitably required with separate training processes on
image and video domains, respectively. In this paper, we show that this can be
precluded by introducing UniST, a Unified Style Transfer framework for both
images and videos. At the core of UniST is a domain interaction transformer
(DIT), which first explores context information within the specific domain and
then interacts contextualized domain information for joint learning. In
particular, DIT enables exploration of temporal information from videos for the
image style transfer task and meanwhile allows rich appearance texture from
images for video style transfer, thus leading to mutual benefits. Considering
heavy computation of traditional multi-head self-attention, we present a simple
yet effective axial multi-head self-attention (AMSA) for DIT, which improves
computational efficiency while maintains style transfer performance. To verify
the effectiveness of UniST, we conduct extensive experiments on both image and
video style transfer tasks and show that UniST performs favorably against
state-of-the-art approaches on both tasks. Code is available at
https://github.com/NevSNev/UniST.Comment: Conference on International Conference on Computer Vision.(ICCV 2023
CAP-VSTNet: Content Affinity Preserved Versatile Style Transfer
Content affinity loss including feature and pixel affinity is a main problem
which leads to artifacts in photorealistic and video style transfer. This paper
proposes a new framework named CAP-VSTNet, which consists of a new reversible
residual network and an unbiased linear transform module, for versatile style
transfer. This reversible residual network can not only preserve content
affinity but not introduce redundant information as traditional reversible
networks, and hence facilitate better stylization. Empowered by Matting
Laplacian training loss which can address the pixel affinity loss problem led
by the linear transform, the proposed framework is applicable and effective on
versatile style transfer. Extensive experiments show that CAP-VSTNet can
produce better qualitative and quantitative results in comparison with the
state-of-the-art methods.Comment: CVPR 202
DiffStyler: Controllable Dual Diffusion for Text-Driven Image Stylization
Despite the impressive results of arbitrary image-guided style transfer
methods, text-driven image stylization has recently been proposed for
transferring a natural image into the stylized one according to textual
descriptions of the target style provided by the user. Unlike previous
image-to-image transfer approaches, text-guided stylization progress provides
users with a more precise and intuitive way to express the desired style.
However, the huge discrepancy between cross-modal inputs/outputs makes it
challenging to conduct text-driven image stylization in a typical feed-forward
CNN pipeline. In this paper, we present DiffStyler on the basis of diffusion
models. The cross-modal style information can be easily integrated as guidance
during the diffusion progress step-by-step. In particular, we use a dual
diffusion processing architecture to control the balance between the content
and style of the diffused results. Furthermore, we propose a content
image-based learnable noise on which the reverse denoising process is based,
enabling the stylization results to better preserve the structure information
of the content image. We validate the proposed DiffStyler beyond the baseline
methods through extensive qualitative and quantitative experiments
AesPA-Net: Aesthetic Pattern-Aware Style Transfer Networks
To deliver the artistic expression of the target style, recent studies
exploit the attention mechanism owing to its ability to map the local patches
of the style image to the corresponding patches of the content image. However,
because of the low semantic correspondence between arbitrary content and
artworks, the attention module repeatedly abuses specific local patches from
the style image, resulting in disharmonious and evident repetitive artifacts.
To overcome this limitation and accomplish impeccable artistic style transfer,
we focus on enhancing the attention mechanism and capturing the rhythm of
patterns that organize the style. In this paper, we introduce a novel metric,
namely pattern repeatability, that quantifies the repetition of patterns in the
style image. Based on the pattern repeatability, we propose Aesthetic
Pattern-Aware style transfer Networks (AesPA-Net) that discover the sweet spot
of local and global style expressions. In addition, we propose a novel
self-supervisory task to encourage the attention mechanism to learn precise and
meaningful semantic correspondence. Lastly, we introduce the patch-wise style
loss to transfer the elaborate rhythm of local patterns. Through qualitative
and quantitative evaluations, we verify the reliability of the proposed pattern
repeatability that aligns with human perception, and demonstrate the
superiority of the proposed framework.Comment: Accepted by ICCV 2023. Code is available at this
https://github.com/Kibeom-Hong/AesPA-Ne
NeAT: Neural Artistic Tracing for Beautiful Style Transfer
Style transfer is the task of reproducing the semantic contents of a source
image in the artistic style of a second target image. In this paper, we present
NeAT, a new state-of-the art feed-forward style transfer method. We
re-formulate feed-forward style transfer as image editing, rather than image
generation, resulting in a model which improves over the state-of-the-art in
both preserving the source content and matching the target style. An important
component of our model's success is identifying and fixing "style halos", a
commonly occurring artefact across many style transfer techniques. In addition
to training and testing on standard datasets, we introduce the BBST-4M dataset,
a new, large scale, high resolution dataset of 4M images. As a component of
curating this data, we present a novel model able to classify if an image is
stylistic. We use BBST-4M to improve and measure the generalization of NeAT
across a huge variety of styles. Not only does NeAT offer state-of-the-art
quality and generalization, it is designed and trained for fast inference at
high resolution
WaterFlow: Heuristic Normalizing Flow for Underwater Image Enhancement and Beyond
Underwater images suffer from light refraction and absorption, which impairs
visibility and interferes the subsequent applications. Existing underwater
image enhancement methods mainly focus on image quality improvement, ignoring
the effect on practice. To balance the visual quality and application, we
propose a heuristic normalizing flow for detection-driven underwater image
enhancement, dubbed WaterFlow. Specifically, we first develop an invertible
mapping to achieve the translation between the degraded image and its clear
counterpart. Considering the differentiability and interpretability, we
incorporate the heuristic prior into the data-driven mapping procedure, where
the ambient light and medium transmission coefficient benefit credible
generation. Furthermore, we introduce a detection perception module to transmit
the implicit semantic guidance into the enhancement procedure, where the
enhanced images hold more detection-favorable features and are able to promote
the detection performance. Extensive experiments prove the superiority of our
WaterFlow, against state-of-the-art methods quantitatively and qualitatively.Comment: 10 pages, 13 figure
Playing Lottery Tickets in Style Transfer Models
Style transfer has achieved great success and attracted a wide range of
attention from both academic and industrial communities due to its flexible
application scenarios. However, the dependence on a pretty large VGG-based
autoencoder leads to existing style transfer models having high parameter
complexities, which limits their applications on resource-constrained devices.
Compared with many other tasks, the compression of style transfer models has
been less explored. Recently, the lottery ticket hypothesis (LTH) has shown
great potential in finding extremely sparse matching subnetworks which can
achieve on par or even better performance than the original full networks when
trained in isolation. In this work, we for the first time perform an empirical
study to verify whether such trainable matching subnetworks also exist in style
transfer models. Specifically, we take two most popular style transfer models,
i.e., AdaIN and SANet, as the main testbeds, which represent global and local
transformation based style transfer methods respectively. We carry out
extensive experiments and comprehensive analysis, and draw the following
conclusions. (1) Compared with fixing the VGG encoder, style transfer models
can benefit more from training the whole network together. (2) Using iterative
magnitude pruning, we find the matching subnetworks at 89.2% sparsity in AdaIN
and 73.7% sparsity in SANet, which demonstrates that style transfer models can
play lottery tickets too. (3) The feature transformation module should also be
pruned to obtain a much sparser model without affecting the existence and
quality of the matching subnetworks. (4) Besides AdaIN and SANet, other models
such as LST, MANet, AdaAttN and MCCNet can also play lottery tickets, which
shows that LTH can be generalized to various style transfer models