89 research outputs found

    Optimized Adaptive Encoding Based on Visual Attention

    Get PDF

    SSIM-Inspired Quality Assessment, Compression, and Processing for Visual Communications

    Get PDF
    Objective Image and Video Quality Assessment (I/VQA) measures predict image/video quality as perceived by human beings - the ultimate consumers of visual data. Existing research in the area is mainly limited to benchmarking and monitoring of visual data. The use of I/VQA measures in the design and optimization of image/video processing algorithms and systems is more desirable, challenging and fruitful but has not been well explored. Among the recently proposed objective I/VQA approaches, the structural similarity (SSIM) index and its variants have emerged as promising measures that show superior performance as compared to the widely used mean squared error (MSE) and are computationally simple compared with other state-of-the-art perceptual quality measures. In addition, SSIM has a number of desirable mathematical properties for optimization tasks. The goal of this research is to break the tradition of using MSE as the optimization criterion for image and video processing algorithms. We tackle several important problems in visual communication applications by exploiting SSIM-inspired design and optimization to achieve significantly better performance. Firstly, the original SSIM is a Full-Reference IQA (FR-IQA) measure that requires access to the original reference image, making it impractical in many visual communication applications. We propose a general purpose Reduced-Reference IQA (RR-IQA) method that can estimate SSIM with high accuracy with the help of a small number of RR features extracted from the original image. Furthermore, we introduce and demonstrate the novel idea of partially repairing an image using RR features. Secondly, image processing algorithms such as image de-noising and image super-resolution are required at various stages of visual communication systems, starting from image acquisition to image display at the receiver. We incorporate SSIM into the framework of sparse signal representation and non-local means methods and demonstrate improved performance in image de-noising and super-resolution. Thirdly, we incorporate SSIM into the framework of perceptual video compression. We propose an SSIM-based rate-distortion optimization scheme and an SSIM-inspired divisive optimization method that transforms the DCT domain frame residuals to a perceptually uniform space. Both approaches demonstrate the potential to largely improve the rate-distortion performance of state-of-the-art video codecs. Finally, in real-world visual communications, it is a common experience that end-users receive video with significantly time-varying quality due to the variations in video content/complexity, codec configuration, and network conditions. How human visual quality of experience (QoE) changes with such time-varying video quality is not yet well-understood. We propose a quality adaptation model that is asymmetrically tuned to increasing and decreasing quality. The model improves upon the direct SSIM approach in predicting subjective perceptual experience of time-varying video quality

    Low-complexity high prediction accuracy visual quality metrics and their applications in H.264/AVC encoding mode decision process

    Get PDF
    In this thesis, we develop a new general framework for computing full reference image quality scores in the discrete wavelet domain using the Haar wavelet. The proposed framework presents an excellent tradeoff between accuracy and complexity. In our framework, quality metrics are categorized as either map-based, which generate a quality (distortion) map to be pooled for the final score, e.g., structural similarity (SSIM), or non map-based, which only give a final score, e.g., Peak signal-to-noise ratio (PSNR). For mapbased metrics, the proposed framework defines a contrast map in the wavelet domain for pooling the quality maps. We also derive a formula to enable the framework to automatically calculate the appropriate level of wavelet decomposition for error-based metrics at a desired viewing distance. To consider the effect of very fine image details in quality assessment, the proposed method defines a multi-level edge map for each image, which comprises only the most informative image subbands. To clarify the application of the framework in computing quality scores, we give some examples showing how the framework can be applied to improve well-known metrics such as SSIM, visual information fidelity (VIF), PSNR, and absolute difference. We compare the complexity of various algorithms obtained by the framework to the Intel IPP-based H.264 baseline profile encoding using C/C++ implementations. We evaluate the overall performance of the proposed metrics, including their prediction accuracy, on two well-known image quality databases and one video quality database. All the simulation results confirm the efficiency of the proposed framework and quality assessment metrics in improving the prediction accuracy and also reduction of the computational complexity. For example, by using the framework, we can compute the VIF at about 5% of the complexity of its original version, but with higher accuracy. In the next step, we study how H.264 coding mode decision can benefit from our developed metrics. We integrate the proposed SSEA metric as the distortion measure inside the H.264 mode decision process. The H.264/AVC JM reference software is used as the implementation and verification platform. We propose a search algorithm to determine the Lagrange multiplier value for each quantization parameter (QP). The search is applied on three different types of video sequences having various motion activity features, and the resulting Lagrange multiplier values are tabulated for each of them. Based on our proposed Framework we propose a new quality metric PSNRA, and use it in this part (mode decision). The simulated rate-distortion (RD) curves show that at the same PSNRA, with the SSEA-based mode decision, the bitrate is reduced about 5% on average compared to the conventional SSE-based approach for the sequences with low and medium motion activities. It is notable that the computational complexity is not increased at all by using the proposed SSEA-based approach instead of the conventional SSE-based method. Therefore, the proposed mode decision algorithm can be used in real-time video coding

    Review of standard traditional distortion metrics and a need for perceptual distortion metric at a (sub) macroblock level

    Get PDF
    Within a video encoder the distortion metric performs an Image Quality Assessment (IQA). However, to exploit perceptual redundancy to lower the convex hull of the Rate- Distortion (R-D) curve, a Perceptual Distortion Metric (PDM) modelling of the Human Visual System (HVS) should be used. Since block-based video encoders like H.264/AVC operate at the Sub-Macroblock (Sub-MB) level, there exists a need to produce a locally operating PDM. A locally operating PDM must meet the requirements of Standard Traditional Distortion Metrics (STDMs), in that it must satisfy the Triangle Equality Rule. Hence, this paper presents a review of STDMs of SSE, SAD and SATD against the perceptual IQA of Structural Similarity (SSIM) at the Sub-MB level. Furthermore, this paper illustrates the Universal Bounded Region (UBR) by block size that supports the triangle equality rule within the Sub-MB level, between SSIM and STDMs like SATD at the prediction stage

    Low complexity in-loop perceptual video coding

    Get PDF
    The tradition of broadcast video is today complemented with user generated content, as portable devices support video coding. Similarly, computing is becoming ubiquitous, where Internet of Things (IoT) incorporate heterogeneous networks to communicate with personal and/or infrastructure devices. Irrespective, the emphasises is on bandwidth and processor efficiencies, meaning increasing the signalling options in video encoding. Consequently, assessment for pixel differences applies uniform cost to be processor efficient, in contrast the Human Visual System (HVS) has non-uniform sensitivity based upon lighting, edges and textures. Existing perceptual assessments, are natively incompatible and processor demanding, making perceptual video coding (PVC) unsuitable for these environments. This research allows existing perceptual assessment at the native level using low complexity techniques, before producing new pixel-base image quality assessments (IQAs). To manage these IQAs a framework was developed and implemented in the high efficiency video coding (HEVC) encoder. This resulted in bit-redistribution, where greater bits and smaller partitioning were allocated to perceptually significant regions. Using a HEVC optimised processor the timing increase was < +4% and < +6% for video streaming and recording applications respectively, 1/3 of an existing low complexity PVC solution. Future work should be directed towards perceptual quantisation which offers the potential for perceptual coding gain
    • …
    corecore