140 research outputs found
AdaWCT: Adaptive Whitening and Coloring Style Injection
Adaptive instance normalization (AdaIN) has become the standard method for
style injection: by re-normalizing features through scale-and-shift operations,
it has found widespread use in style transfer, image generation, and
image-to-image translation. In this work, we present a generalization of AdaIN
which relies on the whitening and coloring transformation (WCT) which we dub
AdaWCT, that we apply for style injection in large GANs. We show, through
experiments on the StarGANv2 architecture, that this generalization, albeit
conceptually simple, results in significant improvements in the quality of the
generated images.Comment: 4 pages + ref
Deepfake Style Transfer Mixture: a First Forensic Ballistics Study on Synthetic Images
Most recent style-transfer techniques based on generative architectures are
able to obtain synthetic multimedia contents, or commonly called deepfakes,
with almost no artifacts. Researchers already demonstrated that synthetic
images contain patterns that can determine not only if it is a deepfake but
also the generative architecture employed to create the image data itself.
These traces can be exploited to study problems that have never been addressed
in the context of deepfakes. To this aim, in this paper a first approach to
investigate the image ballistics on deepfake images subject to style-transfer
manipulations is proposed. Specifically, this paper describes a study on
detecting how many times a digital image has been processed by a generative
architecture for style transfer. Moreover, in order to address and study
accurately forensic ballistics on deepfake images, some mathematical properties
of style-transfer operations were investigated
Data Augmentation Vision Transformer for Fine-grained Image Classification
Recently, the vision transformer (ViT) has made breakthroughs in image
recognition. Its self-attention mechanism (MSA) can extract discriminative
labeling information of different pixel blocks to improve image classification
accuracy. However, the classification marks in their deep layers tend to ignore
local features between layers. In addition, the embedding layer will be
fixed-size pixel blocks. Input network Inevitably introduces additional image
noise. To this end, we study a data augmentation vision transformer (DAVT)
based on data augmentation and proposes a data augmentation method for
attention cropping, which uses attention weights as the guide to crop images
and improve the ability of the network to learn critical features. Secondly, we
also propose a hierarchical attention selection (HAS) method, which improves
the ability of discriminative markers between levels of learning by filtering
and fusing labels between levels. Experimental results show that the accuracy
of this method on the two general datasets, CUB-200-2011, and Stanford Dogs, is
better than the existing mainstream methods, and its accuracy is 1.4\% and
1.6\% higher than the original ViT, respectivelyComment: IEEE Signal Processing Letter
- …