5,988 research outputs found

    A novel low complexity local hybrid pseudo-SSIM-SATD distortion metric towards perceptual rate control

    Get PDF
    The front-end block-based video encoder applies an Image Quality Assessment (IQA) as part of the distortion metric. Typically, the distortion metric applies uniform weighting for the absolute differences within a Sub-Macroblock (Sub-MB) at any given time. As video is predominately designed for Humans, the distortion metric should reflect the Human Visual System (HVS). Thus, a perceptual distortion metric (PDM), will lower the convex hull of the Rate-Distortion (R-D) curve towards the origin, by removing perceptual redundancy and retaining perceptual clues. Structured Similarity (SSIM), a perceptual IQA, has been adapted via logarithmic functions to measure distortion, however, it is restricted to the Group of Picture level and hence unable to adapt to the local Sub-MB changes. This paper proposes a Local Hybrid Pseudo-SSIM-SATD (LHPSS) Distortion Metric, operating at the Sub-MB level and satisfying the Triangle Equality Rule (≤). A detailed discussion of LHPSS's Psuedo-SSIM model will illustrate how SSIM can be perceptually scaled within the distortion metric space of SATD using non-logarithmic functions. Results of HD video encoded across different QPs will be presented showing the competitive bit usage under IbBbBbBbP prediction structure for similar image quality. Finally, the mode decision choices superimposed on the Intra frame will illustrate that LHPSS lowers the R-D curve as homogeneous regions are represented with larger block size

    Video streaming

    Get PDF
    B

    The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

    Full text link
    While it is nearly effortless for humans to quickly assess the perceptual similarity between two images, the underlying processes are thought to be quite complex. Despite this, the most widely used perceptual metrics today, such as PSNR and SSIM, are simple, shallow functions, and fail to account for many nuances of human perception. Recently, the deep learning community has found that features of the VGG network trained on ImageNet classification has been remarkably useful as a training loss for image synthesis. But how perceptual are these so-called "perceptual losses"? What elements are critical for their success? To answer these questions, we introduce a new dataset of human perceptual similarity judgments. We systematically evaluate deep features across different architectures and tasks and compare them with classic metrics. We find that deep features outperform all previous metrics by large margins on our dataset. More surprisingly, this result is not restricted to ImageNet-trained VGG features, but holds across different deep architectures and levels of supervision (supervised, self-supervised, or even unsupervised). Our results suggest that perceptual similarity is an emergent property shared across deep visual representations.Comment: Accepted to CVPR 2018; Code and data available at https://www.github.com/richzhang/PerceptualSimilarit

    Constructing a no-reference H.264/AVC bitstream-based video quality metric using genetic programming-based symbolic regression

    Get PDF
    In order to ensure optimal quality of experience toward end users during video streaming, automatic video quality assessment becomes an important field-of-interest to video service providers. Objective video quality metrics try to estimate perceived quality with high accuracy and in an automated manner. In traditional approaches, these metrics model the complex properties of the human visual system. More recently, however, it has been shown that machine learning approaches can also yield competitive results. In this paper, we present a novel no-reference bitstream-based objective video quality metric that is constructed by genetic programming-based symbolic regression. A key benefit of this approach is that it calculates reliable white-box models that allow us to determine the importance of the parameters. Additionally, these models can provide human insight into the underlying principles of subjective video quality assessment. Numerical results show that perceived quality can be modeled with high accuracy using only parameters extracted from the received video bitstream

    No-reference bitstream-based visual quality impairment detection for high definition H.264/AVC encoded video sequences

    Get PDF
    Ensuring and maintaining adequate Quality of Experience towards end-users are key objectives for video service providers, not only for increasing customer satisfaction but also as service differentiator. However, in the case of High Definition video streaming over IP-based networks, network impairments such as packet loss can severely degrade the perceived visual quality. Several standard organizations have established a minimum set of performance objectives which should be achieved for obtaining satisfactory quality. Therefore, video service providers should continuously monitor the network and the quality of the received video streams in order to detect visual degradations. Objective video quality metrics enable automatic measurement of perceived quality. Unfortunately, the most reliable metrics require access to both the original and the received video streams which makes them inappropriate for real-time monitoring. In this article, we present a novel no-reference bitstream-based visual quality impairment detector which enables real-time detection of visual degradations caused by network impairments. By only incorporating information extracted from the encoded bitstream, network impairments are classified as visible or invisible to the end-user. Our results show that impairment visibility can be classified with a high accuracy which enables real-time validation of the existing performance objectives
    corecore