4,400 research outputs found
SpatioTemporal Feature Integration and Model Fusion for Full Reference Video Quality Assessment
Perceptual video quality assessment models are either frame-based or
video-based, i.e., they apply spatiotemporal filtering or motion estimation to
capture temporal video distortions. Despite their good performance on video
quality databases, video-based approaches are time-consuming and harder to
efficiently deploy. To balance between high performance and computational
efficiency, Netflix developed the Video Multi-method Assessment Fusion (VMAF)
framework, which integrates multiple quality-aware features to predict video
quality. Nevertheless, this fusion framework does not fully exploit temporal
video quality measurements which are relevant to temporal video distortions. To
this end, we propose two improvements to the VMAF framework: SpatioTemporal
VMAF and Ensemble VMAF. Both algorithms exploit efficient temporal video
features which are fed into a single or multiple regression models. To train
our models, we designed a large subjective database and evaluated the proposed
models against state-of-the-art approaches. The compared algorithms will be
made available as part of the open source package in
https://github.com/Netflix/vmaf
Stereoscopic video quality assessment based on 3D convolutional neural networks
The research of stereoscopic video quality assessment (SVQA) plays an important role for promoting the development of stereoscopic video system. Existing SVQA metrics rely on hand-crafted features, which is inaccurate and time-consuming because of the diversity and complexity of stereoscopic video distortion. This paper introduces a 3D convolutional neural networks (CNN) based SVQA framework that can model not only local spatio-temporal information but also global temporal information with cubic difference video patches as input. First, instead of using hand-crafted features, we design a 3D CNN architecture to automatically and effectively capture local spatio-temporal features. Then we employ a quality score fusion strategy considering global temporal clues to obtain final video-level predicted score. Extensive experiments conducted on two public stereoscopic video quality datasets show that the proposed method correlates highly with human perception and outperforms state-of-the-art methods by a large margin. We also show that our 3D CNN features have more desirable property for SVQA than hand-crafted features in previous methods, and our 3D CNN features together with support vector regression (SVR) can further boost the performance. In addition, with no complex preprocessing and GPU acceleration, our proposed method is demonstrated computationally efficient and easy to use
Enhancing VMAF through New Feature Integration and Model Combination
VMAF is a machine learning based video quality assessment method, originally
designed for streaming applications, which combines multiple quality metrics
and video features through SVM regression. It offers higher correlation with
subjective opinions compared to many conventional quality assessment methods.
In this paper we propose enhancements to VMAF through the integration of new
video features and alternative quality metrics (selected from a diverse pool)
alongside multiple model combination. The proposed combination approach enables
training on multiple databases with varying content and distortion
characteristics. Our enhanced VMAF method has been evaluated on eight HD video
databases, and consistently outperforms the original VMAF model (0.6.1) and
other benchmark quality metrics, exhibiting higher correlation with subjective
ground truth data.Comment: 5 pages, 2 figures and 4 table
Zoom-VQA: Patches, Frames and Clips Integration for Video Quality Assessment
Video quality assessment (VQA) aims to simulate the human perception of video
quality, which is influenced by factors ranging from low-level color and
texture details to high-level semantic content. To effectively model these
complicated quality-related factors, in this paper, we decompose video into
three levels (\ie, patch level, frame level, and clip level), and propose a
novel Zoom-VQA architecture to perceive spatio-temporal features at different
levels. It integrates three components: patch attention module, frame pyramid
alignment, and clip ensemble strategy, respectively for capturing
region-of-interest in the spatial dimension, multi-level information at
different feature levels, and distortions distributed over the temporal
dimension. Owing to the comprehensive design, Zoom-VQA obtains state-of-the-art
results on four VQA benchmarks and achieves 2nd place in the NTIRE 2023 VQA
challenge. Notably, Zoom-VQA has outperformed the previous best results on two
subsets of LSVQ, achieving 0.8860 (+1.0%) and 0.7985 (+1.9%) of SRCC on the
respective subsets. Adequate ablation studies further verify the effectiveness
of each component. Codes and models are released in
https://github.com/k-zha14/Zoom-VQA.Comment: Accepted by CVPR 2023 Worksho
- …