268 research outputs found
Recommended from our members
Visibility metrics and their applications in visually lossless image compression
Visibility metrics are image metrics that predict the probability that a human observer can detect differences between a pair of images. These metrics can provide localized information in the form of visibility maps, in which each value represents a probability of detection. An important application of the visibility metric is visually lossless image compression that aims at compressing a given image to the lowest fraction of bit per pixel while keeping the compression artifacts invisible at the same time.
In previous works, most visibility metrics were modeled based on largely simplified assumptions and mathematical models of human visual systems. This approach generally fits well into experimental data measured with simple stimuli, such as Gabor patches. However, it cannot predict complex non-linear effects, such as contrast masking in natural images, particularly well. To predict visibility of image differences accurately, we collected the largest visibility dataset under fixed viewing conditions for calibrating existing visibility metrics and proposed a deep neural network-based visibility metric. We demonstrated in our experiments that the deep neural network-based visibility metric significantly outperformed existing visibility metrics.
However, the deep neural network-based visibility metric cannot predict visibility under varying viewing conditions, such as display brightness and viewing distances that have great impacts on the visibility of distortions. To extend the deep neural network-based visibility metric to varying viewing conditions, we collected the largest visibility dataset under varying display brightness and viewing distances. We proposed incorporating white-box modules, in other words, luminance masking and viewing distance adaptation, into the black-box deep neural network, and we found that the combination of white-box modules and black-box deep neural networks could generalize our proposed visibility metric to varying viewing conditions.
To demonstrate the application of our proposed deep neural network-based visibility metric to visually lossless image compression, we collected the visually lossless image compression dataset under fixed viewing conditions and significantly improved the deep neural network-based visibility metric's accuracy of predicting visually lossless image compression threshold by pre-training the visibility metric with a synthetic dataset generated by the state-of-the-art white-box visibility metric---HDR-VDP \cite{Mantiuk2011}. In a large-scale study of 1000 images, we found that with our improved visibility metric, we can save around 60\% to 70\% bits for visually lossless image compression encoding as compared to the default visually lossless quality level of 90.
Because predicting image visibility and predicting image quality are closely related research topics, we also proposed a trained perceptually uniform transform for high dynamic range images and videos quality assessments by training a perceptual encoding function on a set of subjective quality assessment datasets. We have shown that when combining the trained perceptual encoding function with standard dynamic range image quality metrics, such as peak-signal-noise-ratio (PSNR), better performance was achieved compared to the untrained version
Joint Learning of Deep Texture and High-Frequency Features for Computer-Generated Image Detection
Distinguishing between computer-generated (CG) and natural photographic (PG)
images is of great importance to verify the authenticity and originality of
digital images. However, the recent cutting-edge generation methods enable high
qualities of synthesis in CG images, which makes this challenging task even
trickier. To address this issue, a joint learning strategy with deep texture
and high-frequency features for CG image detection is proposed. We first
formulate and deeply analyze the different acquisition processes of CG and PG
images. Based on the finding that multiple different modules in image
acquisition will lead to different sensitivity inconsistencies to the
convolutional neural network (CNN)-based rendering in images, we propose a deep
texture rendering module for texture difference enhancement and discriminative
texture representation. Specifically, the semantic segmentation map is
generated to guide the affine transformation operation, which is used to
recover the texture in different regions of the input image. Then, the
combination of the original image and the high-frequency components of the
original and rendered images are fed into a multi-branch neural network
equipped with attention mechanisms, which refines intermediate features and
facilitates trace exploration in spatial and channel dimensions respectively.
Extensive experiments on two public datasets and a newly constructed dataset
with more realistic and diverse images show that the proposed approach
outperforms existing methods in the field by a clear margin. Besides, results
also demonstrate the detection robustness and generalization ability of the
proposed approach to postprocessing operations and generative adversarial
network (GAN) generated images
D-Unet: A Dual-encoder U-Net for Image Splicing Forgery Detection and Localization
Recently, many detection methods based on convolutional neural networks
(CNNs) have been proposed for image splicing forgery detection. Most of these
detection methods focus on the local patches or local objects. In fact, image
splicing forgery detection is a global binary classification task that
distinguishes the tampered and non-tampered regions by image fingerprints.
However, some specific image contents are hardly retained by CNN-based
detection networks, but if included, would improve the detection accuracy of
the networks. To resolve these issues, we propose a novel network called
dual-encoder U-Net (D-Unet) for image splicing forgery detection, which employs
an unfixed encoder and a fixed encoder. The unfixed encoder autonomously learns
the image fingerprints that differentiate between the tampered and non-tampered
regions, whereas the fixed encoder intentionally provides the direction
information that assists the learning and detection of the network. This
dual-encoder is followed by a spatial pyramid global-feature extraction module
that expands the global insight of D-Unet for classifying the tampered and
non-tampered regions more accurately. In an experimental comparison study of
D-Unet and state-of-the-art methods, D-Unet outperformed the other methods in
image-level and pixel-level detection, without requiring pre-training or
training on a large number of forgery images. Moreover, it was stably robust to
different attacks.Comment: 13 pages, 13 figure
- …