23 research outputs found
Recover Subjective Quality Scores from Noisy Measurements
Simple quality metrics such as PSNR are known to not correlate well with
subjective quality when tested across a wide spectrum of video content or
quality regime. Recently, efforts have been made in designing objective quality
metrics trained on subjective data (e.g. VMAF), demonstrating better
correlation with video quality perceived by human. Clearly, the accuracy of
such a metric heavily depends on the quality of the subjective data that it is
trained on. In this paper, we propose a new approach to recover subjective
quality scores from noisy raw measurements, using maximum likelihood
estimation, by jointly estimating the subjective quality of impaired videos,
the bias and consistency of test subjects, and the ambiguity of video contents
all together. We also derive closed-from expression for the confidence interval
of each estimate. Compared to previous methods which partially exploit the
subjective information, our approach is able to exploit the information in
full, yielding tighter confidence interval and better handling of outliers
without the need for z-scoring or subject rejection. It also handles missing
data more gracefully. Finally, as side information, it provides interesting
insights on the test subjects and video contents.Comment: 16 pages; abridged version appeared in Data Compression Conference
(DCC) 201
FOVQA: Blind Foveated Video Quality Assessment
Previous blind or No Reference (NR) video quality assessment (VQA) models
largely rely on features drawn from natural scene statistics (NSS), but under
the assumption that the image statistics are stationary in the spatial domain.
Several of these models are quite successful on standard pictures. However, in
Virtual Reality (VR) applications, foveated video compression is regaining
attention, and the concept of space-variant quality assessment is of interest,
given the availability of increasingly high spatial and temporal resolution
contents and practical ways of measuring gaze direction. Distortions from
foveated video compression increase with increased eccentricity, implying that
the natural scene statistics are space-variant. Towards advancing the
development of foveated compression / streaming algorithms, we have devised a
no-reference (NR) foveated video quality assessment model, called FOVQA, which
is based on new models of space-variant natural scene statistics (NSS) and
natural video statistics (NVS). Specifically, we deploy a space-variant
generalized Gaussian distribution (SV-GGD) model and a space-variant
asynchronous generalized Gaussian distribution (SV-AGGD) model of mean
subtracted contrast normalized (MSCN) coefficients and products of neighboring
MSCN coefficients, respectively. We devise a foveated video quality predictor
that extracts radial basis features, and other features that capture
perceptually annoying rapid quality fall-offs. We find that FOVQA achieves
state-of-the-art (SOTA) performance on the new 2D LIVE-FBT-FCVR database, as
compared with other leading FIQA / VQA models. we have made our implementation
of FOVQA available at: http://live.ece.utexas.edu/research/Quality/FOVQA.zip
Interpolation of Scientific Image Databases
This paper explores how recent convolutional neural network (CNN)-based techniques can be used to interpolate images inside scientific image databases. These databases are frequently used for the interactive visualization of large-scale simulations, where images correspond to samples of the parameter space (e.g., timesteps, isovalues, thresholds, etc.) and the visualization space (e.g., camera locations, clipping planes, etc.). These databases can be browsed post hoc along the sampling axis to emulate real-time interaction with large-scale datasets. However, the resulting databases are limited to their contained images, i.e., the sampling points. In this paper, we explore how efficiently and accurately CNN-based techniques can derive new images by interpolating database elements. We demonstrate on several real-world examples that the size of databases can be further reduced by dropping samples that can be interpolated post hoc with an acceptable error, which we measure qualitatively and quantitatively
Enhancing VMAF through New Feature Integration and Model Combination
VMAF is a machine learning based video quality assessment method, originally
designed for streaming applications, which combines multiple quality metrics
and video features through SVM regression. It offers higher correlation with
subjective opinions compared to many conventional quality assessment methods.
In this paper we propose enhancements to VMAF through the integration of new
video features and alternative quality metrics (selected from a diverse pool)
alongside multiple model combination. The proposed combination approach enables
training on multiple databases with varying content and distortion
characteristics. Our enhanced VMAF method has been evaluated on eight HD video
databases, and consistently outperforms the original VMAF model (0.6.1) and
other benchmark quality metrics, exhibiting higher correlation with subjective
ground truth data.Comment: 5 pages, 2 figures and 4 table
A machine learning driven solution to the problem of perceptual video quality metrics
The advent of high-speed internet connections, advanced video coding algorithms, and consumer-grade computers with high computational capabilities has led videostreaming-over-the-internet to make up the majority of network traffic. This effect has led to a continuously expanding video streaming industry that seeks to offer enhanced quality-of-experience (QoE) to its users at the lowest cost possible. Video streaming services are now able to adapt to the hardware and network restrictions that each user faces and thus provide the best experience possible under those restrictions. The most common way to adapt to network bandwidth restrictions is to offer a video stream at the highest possible visual quality, for the maximum achievable bitrate under the network connection in use. This is achieved by storing various pre-encoded versions of the video content with different bitrate and visual quality settings. Visual quality is measured by means of objective quality metrics, such as the Mean Squared Error (MSE), Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), Visual Information Fidelity (VIF), and others, which can be easily computed analytically. Nevertheless, it is widely accepted that although these metrics provide an accurate estimate of the statistical quality degradation, they do not reflect the viewer’s perception of visual quality accurately. As a result, the acquisition of user ratings in the form of Mean Opinion Scores (MOS) remains the most accurate depiction of human-perceived video quality, albeit very costly and time consuming, and thus cannot be practically employed by video streaming providers that have hundreds or thousands of videos in their catalogues. A recent very promising approach for addressing this limitation is the use of machine learning techniques in order to train models that represent human video quality perception more accurately. To this end, regression techniques are used in order to map objective quality metrics to human video quality ratings, acquired for a large number of diverse video sequences. Results have been very promising, with approaches like the Video Multimethod Assessment Fusion (VMAF) metric achieving higher correlations to useracquired MOS ratings compared to traditional widely used objective quality metrics
SpatioTemporal Feature Integration and Model Fusion for Full Reference Video Quality Assessment
Perceptual video quality assessment models are either frame-based or
video-based, i.e., they apply spatiotemporal filtering or motion estimation to
capture temporal video distortions. Despite their good performance on video
quality databases, video-based approaches are time-consuming and harder to
efficiently deploy. To balance between high performance and computational
efficiency, Netflix developed the Video Multi-method Assessment Fusion (VMAF)
framework, which integrates multiple quality-aware features to predict video
quality. Nevertheless, this fusion framework does not fully exploit temporal
video quality measurements which are relevant to temporal video distortions. To
this end, we propose two improvements to the VMAF framework: SpatioTemporal
VMAF and Ensemble VMAF. Both algorithms exploit efficient temporal video
features which are fed into a single or multiple regression models. To train
our models, we designed a large subjective database and evaluated the proposed
models against state-of-the-art approaches. The compared algorithms will be
made available as part of the open source package in
https://github.com/Netflix/vmaf