91 research outputs found
SSIM-Inspired Quality Assessment, Compression, and Processing for Visual Communications
Objective Image and Video Quality Assessment (I/VQA) measures predict image/video quality as perceived by human beings - the ultimate consumers of visual data. Existing research in the area is mainly limited to benchmarking and monitoring of visual data. The use of I/VQA measures in the design and optimization of image/video processing algorithms and systems is more desirable, challenging and fruitful but has not been well explored. Among the recently proposed objective I/VQA approaches, the structural similarity (SSIM) index and its variants have emerged as promising measures that show superior performance as compared to the widely used mean squared error (MSE) and are computationally simple compared with other state-of-the-art perceptual quality measures. In addition, SSIM has a number of desirable mathematical properties for optimization tasks. The goal of this research is to break the tradition of using MSE as the optimization criterion for image and video processing algorithms. We tackle several important problems in visual communication applications by exploiting SSIM-inspired design and optimization to achieve significantly better performance.
Firstly, the original SSIM is a Full-Reference IQA (FR-IQA) measure that requires access to the original reference image, making it impractical in many visual communication applications. We propose a general purpose Reduced-Reference IQA (RR-IQA) method that can estimate SSIM with high accuracy with the help of a small number of RR features extracted from the original image. Furthermore, we introduce and demonstrate the novel idea of partially repairing an image using RR features. Secondly, image processing algorithms such as image de-noising and image super-resolution are required at various stages of visual communication systems, starting from image acquisition to image display at the receiver. We incorporate SSIM into the framework of sparse signal representation and non-local means methods and demonstrate improved performance in image de-noising and super-resolution. Thirdly, we incorporate SSIM into the framework of perceptual video compression. We propose an SSIM-based rate-distortion optimization scheme and an SSIM-inspired divisive optimization method that transforms the DCT domain frame residuals to a perceptually uniform space. Both approaches demonstrate the potential to largely improve the rate-distortion performance of state-of-the-art video codecs. Finally, in real-world visual communications, it is a common experience that end-users receive video with significantly time-varying quality due to the variations in video content/complexity, codec configuration, and network conditions. How human visual quality of experience (QoE) changes with such time-varying video quality is not yet well-understood. We propose a quality adaptation model that is asymmetrically tuned to increasing and decreasing quality. The model improves upon the direct SSIM approach in predicting subjective perceptual experience of time-varying video quality
Binocular Rivalry Oriented Predictive Auto-Encoding Network for Blind Stereoscopic Image Quality Measurement
Stereoscopic image quality measurement (SIQM) has become increasingly
important for guiding stereo image processing and commutation systems due to
the widespread usage of 3D contents. Compared with conventional methods which
are relied on hand-crafted features, deep learning oriented measurements have
achieved remarkable performance in recent years. However, most existing deep
SIQM evaluators are not specifically built for stereoscopic contents and
consider little prior domain knowledge of the 3D human visual system (HVS) in
network design. In this paper, we develop a Predictive Auto-encoDing Network
(PAD-Net) for blind/No-Reference stereoscopic image quality measurement. In the
first stage, inspired by the predictive coding theory that the cognition system
tries to match bottom-up visual signal with top-down predictions, we adopt the
encoder-decoder architecture to reconstruct the distorted inputs. Besides,
motivated by the binocular rivalry phenomenon, we leverage the likelihood and
prior maps generated from the predictive coding process in the Siamese
framework for assisting SIQM. In the second stage, quality regression network
is applied to the fusion image for acquiring the perceptual quality prediction.
The performance of PAD-Net has been extensively evaluated on three benchmark
databases and the superiority has been well validated on both symmetrically and
asymmetrically distorted stereoscopic images under various distortion types
Learning Accurate Entropy Model with Global Reference for Image Compression
In recent deep image compression neural networks, the entropy model plays a
critical role in estimating the prior distribution of deep image encodings.
Existing methods combine hyperprior with local context in the entropy
estimation function. This greatly limits their performance due to the absence
of a global vision. In this work, we propose a novel Global Reference Model for
image compression to effectively leverage both the local and the global context
information, leading to an enhanced compression rate. The proposed method scans
decoded latents and then finds the most relevant latent to assist the
distribution estimating of the current latent. A by-product of this work is the
innovation of a mean-shifting GDN module that further improves the performance.
Experimental results demonstrate that the proposed model outperforms the
rate-distortion performance of most of the state-of-the-art methods in the
industry
- …