In this paper, we aim to devise a universally versatile style transfer method
capable of performing artistic, photo-realistic, and video style transfer
jointly, without seeing videos during training. Previous single-frame methods
assume a strong constraint on the whole image to maintain temporal consistency,
which could be violated in many cases. Instead, we make a mild and reasonable
assumption that global inconsistency is dominated by local inconsistencies and
devise a generic Contrastive Coherence Preserving Loss (CCPL) applied to local
patches. CCPL can preserve the coherence of the content source during style
transfer without degrading stylization. Moreover, it owns a neighbor-regulating
mechanism, resulting in a vast reduction of local distortions and considerable
visual quality improvement. Aside from its superior performance on versatile
style transfer, it can be easily extended to other tasks, such as
image-to-image translation. Besides, to better fuse content and style features,
we propose Simple Covariance Transformation (SCT) to effectively align
second-order statistics of the content feature with the style feature.
Experiments demonstrate the effectiveness of the resulting model for versatile
style transfer, when armed with CCPL.Comment: Accepted by ECCV2022 as an oral paper; code url:
https://github.com/JarrentWu1031/CCPL Video demo:
https://youtu.be/scZuJCXhL1