5,988 research outputs found
A novel low complexity local hybrid pseudo-SSIM-SATD distortion metric towards perceptual rate control
The front-end block-based video encoder applies an Image Quality Assessment (IQA) as part of the distortion metric. Typically, the distortion metric applies uniform weighting for the absolute differences within a Sub-Macroblock (Sub-MB) at any given time. As video is predominately designed for Humans, the distortion metric should reflect the Human Visual System (HVS). Thus, a perceptual distortion metric (PDM), will lower the convex hull of the Rate-Distortion (R-D) curve towards the origin, by removing perceptual redundancy and retaining perceptual clues. Structured Similarity (SSIM), a perceptual IQA, has been adapted via logarithmic functions to measure distortion, however, it is restricted to the Group of Picture level and hence unable to adapt to the local Sub-MB changes. This paper proposes a Local Hybrid Pseudo-SSIM-SATD (LHPSS) Distortion Metric, operating at the Sub-MB level and satisfying the Triangle Equality Rule (≤). A detailed discussion of LHPSS's Psuedo-SSIM model will illustrate how SSIM can be perceptually scaled within the distortion metric space of SATD using non-logarithmic functions. Results of HD video encoded across different QPs will be presented showing the competitive bit usage under IbBbBbBbP prediction structure for similar image quality. Finally, the mode decision choices superimposed on the Intra frame will illustrate that LHPSS lowers the R-D curve as homogeneous regions are represented with larger block size
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
While it is nearly effortless for humans to quickly assess the perceptual
similarity between two images, the underlying processes are thought to be quite
complex. Despite this, the most widely used perceptual metrics today, such as
PSNR and SSIM, are simple, shallow functions, and fail to account for many
nuances of human perception. Recently, the deep learning community has found
that features of the VGG network trained on ImageNet classification has been
remarkably useful as a training loss for image synthesis. But how perceptual
are these so-called "perceptual losses"? What elements are critical for their
success? To answer these questions, we introduce a new dataset of human
perceptual similarity judgments. We systematically evaluate deep features
across different architectures and tasks and compare them with classic metrics.
We find that deep features outperform all previous metrics by large margins on
our dataset. More surprisingly, this result is not restricted to
ImageNet-trained VGG features, but holds across different deep architectures
and levels of supervision (supervised, self-supervised, or even unsupervised).
Our results suggest that perceptual similarity is an emergent property shared
across deep visual representations.Comment: Accepted to CVPR 2018; Code and data available at
https://www.github.com/richzhang/PerceptualSimilarit
Constructing a no-reference H.264/AVC bitstream-based video quality metric using genetic programming-based symbolic regression
In order to ensure optimal quality of experience toward end users during video streaming, automatic video quality assessment becomes an important field-of-interest to video service providers. Objective video quality metrics try to estimate perceived quality with high accuracy and in an automated manner. In traditional approaches, these metrics model the complex properties of the human visual system. More recently, however, it has been shown that machine learning approaches can also yield competitive results. In this paper, we present a novel no-reference bitstream-based objective video quality metric that is constructed by genetic programming-based symbolic regression. A key benefit of this approach is that it calculates reliable white-box models that allow us to determine the importance of the parameters. Additionally, these models can provide human insight into the underlying principles of subjective video quality assessment. Numerical results show that perceived quality can be modeled with high accuracy using only parameters extracted from the received video bitstream
No-reference bitstream-based visual quality impairment detection for high definition H.264/AVC encoded video sequences
Ensuring and maintaining adequate Quality of Experience towards end-users are key objectives for video service providers, not only for increasing customer satisfaction but also as service differentiator. However, in the case of High Definition video streaming over IP-based networks, network impairments such as packet loss can severely degrade the perceived visual quality. Several standard organizations have established a minimum set of performance objectives which should be achieved for obtaining satisfactory quality. Therefore, video service providers should continuously monitor the network and the quality of the received video streams in order to detect visual degradations. Objective video quality metrics enable automatic measurement of perceived quality. Unfortunately, the most reliable metrics require access to both the original and the received video streams which makes them inappropriate for real-time monitoring. In this article, we present a novel no-reference bitstream-based visual quality impairment detector which enables real-time detection of visual degradations caused by network impairments. By only incorporating information extracted from the encoded bitstream, network impairments are classified as visible or invisible to the end-user. Our results show that impairment visibility can be classified with a high accuracy which enables real-time validation of the existing performance objectives
- …