23 research outputs found

    Recover Subjective Quality Scores from Noisy Measurements

    Full text link
    Simple quality metrics such as PSNR are known to not correlate well with subjective quality when tested across a wide spectrum of video content or quality regime. Recently, efforts have been made in designing objective quality metrics trained on subjective data (e.g. VMAF), demonstrating better correlation with video quality perceived by human. Clearly, the accuracy of such a metric heavily depends on the quality of the subjective data that it is trained on. In this paper, we propose a new approach to recover subjective quality scores from noisy raw measurements, using maximum likelihood estimation, by jointly estimating the subjective quality of impaired videos, the bias and consistency of test subjects, and the ambiguity of video contents all together. We also derive closed-from expression for the confidence interval of each estimate. Compared to previous methods which partially exploit the subjective information, our approach is able to exploit the information in full, yielding tighter confidence interval and better handling of outliers without the need for z-scoring or subject rejection. It also handles missing data more gracefully. Finally, as side information, it provides interesting insights on the test subjects and video contents.Comment: 16 pages; abridged version appeared in Data Compression Conference (DCC) 201

    FOVQA: Blind Foveated Video Quality Assessment

    Full text link
    Previous blind or No Reference (NR) video quality assessment (VQA) models largely rely on features drawn from natural scene statistics (NSS), but under the assumption that the image statistics are stationary in the spatial domain. Several of these models are quite successful on standard pictures. However, in Virtual Reality (VR) applications, foveated video compression is regaining attention, and the concept of space-variant quality assessment is of interest, given the availability of increasingly high spatial and temporal resolution contents and practical ways of measuring gaze direction. Distortions from foveated video compression increase with increased eccentricity, implying that the natural scene statistics are space-variant. Towards advancing the development of foveated compression / streaming algorithms, we have devised a no-reference (NR) foveated video quality assessment model, called FOVQA, which is based on new models of space-variant natural scene statistics (NSS) and natural video statistics (NVS). Specifically, we deploy a space-variant generalized Gaussian distribution (SV-GGD) model and a space-variant asynchronous generalized Gaussian distribution (SV-AGGD) model of mean subtracted contrast normalized (MSCN) coefficients and products of neighboring MSCN coefficients, respectively. We devise a foveated video quality predictor that extracts radial basis features, and other features that capture perceptually annoying rapid quality fall-offs. We find that FOVQA achieves state-of-the-art (SOTA) performance on the new 2D LIVE-FBT-FCVR database, as compared with other leading FIQA / VQA models. we have made our implementation of FOVQA available at: http://live.ece.utexas.edu/research/Quality/FOVQA.zip

    Interpolation of Scientific Image Databases

    Get PDF
    This paper explores how recent convolutional neural network (CNN)-based techniques can be used to interpolate images inside scientific image databases. These databases are frequently used for the interactive visualization of large-scale simulations, where images correspond to samples of the parameter space (e.g., timesteps, isovalues, thresholds, etc.) and the visualization space (e.g., camera locations, clipping planes, etc.). These databases can be browsed post hoc along the sampling axis to emulate real-time interaction with large-scale datasets. However, the resulting databases are limited to their contained images, i.e., the sampling points. In this paper, we explore how efficiently and accurately CNN-based techniques can derive new images by interpolating database elements. We demonstrate on several real-world examples that the size of databases can be further reduced by dropping samples that can be interpolated post hoc with an acceptable error, which we measure qualitatively and quantitatively

    Enhancing VMAF through New Feature Integration and Model Combination

    Get PDF
    VMAF is a machine learning based video quality assessment method, originally designed for streaming applications, which combines multiple quality metrics and video features through SVM regression. It offers higher correlation with subjective opinions compared to many conventional quality assessment methods. In this paper we propose enhancements to VMAF through the integration of new video features and alternative quality metrics (selected from a diverse pool) alongside multiple model combination. The proposed combination approach enables training on multiple databases with varying content and distortion characteristics. Our enhanced VMAF method has been evaluated on eight HD video databases, and consistently outperforms the original VMAF model (0.6.1) and other benchmark quality metrics, exhibiting higher correlation with subjective ground truth data.Comment: 5 pages, 2 figures and 4 table

    A machine learning driven solution to the problem of perceptual video quality metrics

    Get PDF
    The advent of high-speed internet connections, advanced video coding algorithms, and consumer-grade computers with high computational capabilities has led videostreaming-over-the-internet to make up the majority of network traffic. This effect has led to a continuously expanding video streaming industry that seeks to offer enhanced quality-of-experience (QoE) to its users at the lowest cost possible. Video streaming services are now able to adapt to the hardware and network restrictions that each user faces and thus provide the best experience possible under those restrictions. The most common way to adapt to network bandwidth restrictions is to offer a video stream at the highest possible visual quality, for the maximum achievable bitrate under the network connection in use. This is achieved by storing various pre-encoded versions of the video content with different bitrate and visual quality settings. Visual quality is measured by means of objective quality metrics, such as the Mean Squared Error (MSE), Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), Visual Information Fidelity (VIF), and others, which can be easily computed analytically. Nevertheless, it is widely accepted that although these metrics provide an accurate estimate of the statistical quality degradation, they do not reflect the viewer’s perception of visual quality accurately. As a result, the acquisition of user ratings in the form of Mean Opinion Scores (MOS) remains the most accurate depiction of human-perceived video quality, albeit very costly and time consuming, and thus cannot be practically employed by video streaming providers that have hundreds or thousands of videos in their catalogues. A recent very promising approach for addressing this limitation is the use of machine learning techniques in order to train models that represent human video quality perception more accurately. To this end, regression techniques are used in order to map objective quality metrics to human video quality ratings, acquired for a large number of diverse video sequences. Results have been very promising, with approaches like the Video Multimethod Assessment Fusion (VMAF) metric achieving higher correlations to useracquired MOS ratings compared to traditional widely used objective quality metrics

    SpatioTemporal Feature Integration and Model Fusion for Full Reference Video Quality Assessment

    Full text link
    Perceptual video quality assessment models are either frame-based or video-based, i.e., they apply spatiotemporal filtering or motion estimation to capture temporal video distortions. Despite their good performance on video quality databases, video-based approaches are time-consuming and harder to efficiently deploy. To balance between high performance and computational efficiency, Netflix developed the Video Multi-method Assessment Fusion (VMAF) framework, which integrates multiple quality-aware features to predict video quality. Nevertheless, this fusion framework does not fully exploit temporal video quality measurements which are relevant to temporal video distortions. To this end, we propose two improvements to the VMAF framework: SpatioTemporal VMAF and Ensemble VMAF. Both algorithms exploit efficient temporal video features which are fed into a single or multiple regression models. To train our models, we designed a large subjective database and evaluated the proposed models against state-of-the-art approaches. The compared algorithms will be made available as part of the open source package in https://github.com/Netflix/vmaf
    corecore