12 research outputs found
Playing Lottery Tickets in Style Transfer Models
Style transfer has achieved great success and attracted a wide range of
attention from both academic and industrial communities due to its flexible
application scenarios. However, the dependence on a pretty large VGG-based
autoencoder leads to existing style transfer models having high parameter
complexities, which limits their applications on resource-constrained devices.
Compared with many other tasks, the compression of style transfer models has
been less explored. Recently, the lottery ticket hypothesis (LTH) has shown
great potential in finding extremely sparse matching subnetworks which can
achieve on par or even better performance than the original full networks when
trained in isolation. In this work, we for the first time perform an empirical
study to verify whether such trainable matching subnetworks also exist in style
transfer models. Specifically, we take two most popular style transfer models,
i.e., AdaIN and SANet, as the main testbeds, which represent global and local
transformation based style transfer methods respectively. We carry out
extensive experiments and comprehensive analysis, and draw the following
conclusions. (1) Compared with fixing the VGG encoder, style transfer models
can benefit more from training the whole network together. (2) Using iterative
magnitude pruning, we find the matching subnetworks at 89.2% sparsity in AdaIN
and 73.7% sparsity in SANet, which demonstrates that style transfer models can
play lottery tickets too. (3) The feature transformation module should also be
pruned to obtain a much sparser model without affecting the existence and
quality of the matching subnetworks. (4) Besides AdaIN and SANet, other models
such as LST, MANet, AdaAttN and MCCNet can also play lottery tickets, which
shows that LTH can be generalized to various style transfer models
Knowledge Distillation Thrives on Data Augmentation
Knowledge distillation (KD) is a general deep neural network training
framework that uses a teacher model to guide a student model. Many works have
explored the rationale for its success, however, its interplay with data
augmentation (DA) has not been well recognized so far. In this paper, we are
motivated by an interesting observation in classification: KD loss can benefit
from extended training iterations while the cross-entropy loss does not. We
show this disparity arises because of data augmentation: KD loss can tap into
the extra information from different input views brought by DA. By this
explanation, we propose to enhance KD via a stronger data augmentation scheme
(e.g., mixup, CutMix). Furthermore, an even stronger new DA approach is
developed specifically for KD based on the idea of active learning. The
findings and merits of the proposed method are validated by extensive
experiments on CIFAR-100, Tiny ImageNet, and ImageNet datasets. We can achieve
improved performance simply by using the original KD loss combined with
stronger augmentation schemes, compared to existing state-of-the-art methods,
which employ more advanced distillation losses. In addition, when our
approaches are combined with more advanced distillation losses, we can advance
the state-of-the-art performance even more. On top of the encouraging
performance, this paper also sheds some light on explaining the success of
knowledge distillation. The discovered interplay between KD and DA may inspire
more advanced KD algorithms.Comment: Code will be updated soo
CCPL: Contrastive Coherence Preserving Loss for Versatile Style Transfer
In this paper, we aim to devise a universally versatile style transfer method
capable of performing artistic, photo-realistic, and video style transfer
jointly, without seeing videos during training. Previous single-frame methods
assume a strong constraint on the whole image to maintain temporal consistency,
which could be violated in many cases. Instead, we make a mild and reasonable
assumption that global inconsistency is dominated by local inconsistencies and
devise a generic Contrastive Coherence Preserving Loss (CCPL) applied to local
patches. CCPL can preserve the coherence of the content source during style
transfer without degrading stylization. Moreover, it owns a neighbor-regulating
mechanism, resulting in a vast reduction of local distortions and considerable
visual quality improvement. Aside from its superior performance on versatile
style transfer, it can be easily extended to other tasks, such as
image-to-image translation. Besides, to better fuse content and style features,
we propose Simple Covariance Transformation (SCT) to effectively align
second-order statistics of the content feature with the style feature.
Experiments demonstrate the effectiveness of the resulting model for versatile
style transfer, when armed with CCPL.Comment: Accepted by ECCV2022 as an oral paper; code url:
https://github.com/JarrentWu1031/CCPL Video demo:
https://youtu.be/scZuJCXhL1
ArtFusion: Controllable Arbitrary Style Transfer using Dual Conditional Latent Diffusion Models
Arbitrary Style Transfer (AST) aims to transform images by adopting the style
from any selected artwork. Nonetheless, the need to accommodate diverse and
subjective user preferences poses a significant challenge. While some users
wish to preserve distinct content structures, others might favor a more
pronounced stylization. Despite advances in feed-forward AST methods, their
limited customizability hinders their practical application. We propose a new
approach, ArtFusion, which provides a flexible balance between content and
style. In contrast to traditional methods reliant on biased similarity losses,
ArtFusion utilizes our innovative Dual Conditional Latent Diffusion
Probabilistic Models (Dual-cLDM). This approach mitigates repetitive patterns
and enhances subtle artistic aspects like brush strokes and genre-specific
features. Despite the promising results of conditional diffusion probabilistic
models (cDM) in various generative tasks, their introduction to style transfer
is challenging due to the requirement for paired training data. ArtFusion
successfully navigates this issue, offering more practical and controllable
stylization. A key element of our approach involves using a single image for
both content and style during model training, all the while maintaining
effective stylization during inference. ArtFusion outperforms existing
approaches on outstanding controllability and faithful presentation of artistic
details, providing evidence of its superior style transfer capabilities.
Furthermore, the Dual-cLDM utilized in ArtFusion carries the potential for a
variety of complex multi-condition generative tasks, thus greatly broadening
the impact of our research.Comment: Code is available at https://github.com/ChenDarYen/ArtFusio
Playing lottery tickets in style transfer models
Style transfer has achieved great success and attracted a wide range of attention from both academic and industrial communities due to its flexible application scenarios. However, the dependence on a pretty large VGG-based autoencoder leads to existing style transfer models having high parameter complexities, which limits their applications on resource-constrained devices. Compared with many other tasks, the compression of style transfer models has been less explored. Recently, the lottery ticket hypothesis (LTH) has shown great potential in finding extremely sparse matching subnetworks which can achieve on par or even better performance than the original full networks when trained in isolation. In this work, we for the first time perform an empirical study to verify whether such trainable matching subnetworks also exist in style transfer models. Specifically, we take two most popular style transfer models, i.e., AdaIN and SANet, as the main testbeds, which represent global and local transformation based style transfer methods respectively. We carry out extensive experiments and comprehensive analysis, and draw the following conclusions. (1) Compared with fixing the VGG encoder, style transfer models can benefit more from training the whole network together. (2) Using iterative magnitude pruning, we find the matching subnetworks at 89.2% sparsity in AdaIN and 73.7% sparsity in SANet, which demonstrates that Style transfer models can play lottery tickets too. (3) The feature transformation module should also be pruned to obtain a much sparser model without affecting the existence and quality of the matching subnetworks. (4) Besides AdaIN and SANet, other models such as LST, MANet, AdaAttN and MCCNet can also play lottery tickets, which shows that LTH can be generalized to various style transfer models
Fast Coherent Video Style Transfer via Flow Errors Reduction
For video style transfer, naively applying still image techniques to process a video frame-by-frame independently often causes flickering artefacts. Some works adopt optical flow into the design of temporal constraint loss to secure temporal consistency. However, these works still suffer from incoherence (including ghosting artefacts) where large motions or occlusions occur, as optical flow fails to detect the boundaries of objects accurately. To address this problem, we propose a novel framework which consists of the following two stages: (1) creating new initialization images from proposed mask techniques, which are able to significantly reduce the flow errors; (2) process these initialized images iteratively with proposed losses to obtain stylized videos which are free of artefacts, which also increases the speed from over 3 min per frame to less than 2 s per frame for the gradient-based optimization methods. To be specific, we propose a multi-scale mask fusion scheme to reduce untraceable flow errors, and obtain an incremental mask to reduce ghosting artefacts. In addition, a multi-frame mask fusion scheme is designed to reduce traceable flow errors. In our proposed losses, the Sharpness Losses are used to deal with the potential image blurriness artefacts over long-range frames, and the Coherent Losses are performed to restrict the temporal consistency at both the multi-frame RGB level and Feature level. Overall, our approach produces stable video stylization outputs even in large motion or occlusion scenarios. The experiments demonstrate that the proposed method outperforms the state-of-the-art video style transfer methods qualitatively and quantitatively on the MPI Sintel dataset