24 research outputs found
Perceptual video quality assessment: the journey continues!
Perceptual Video Quality Assessment (VQA) is one of the most fundamental and challenging problems in the field of Video Engineering. Along with video compression, it has become one of two dominant theoretical and algorithmic technologies in television streaming and social media. Over the last 2Â decades, the volume of video traffic over the internet has grown exponentially, powered by rapid advancements in cloud services, faster video compression technologies, and increased access to high-speed, low-latency wireless internet connectivity. This has given rise to issues related to delivering extraordinary volumes of picture and video data to an increasingly sophisticated and demanding global audience. Consequently, developing algorithms to measure the quality of pictures and videos as perceived by humans has become increasingly critical since these algorithms can be used to perceptually optimize trade-offs between quality and bandwidth consumption. VQA models have evolved from algorithms developed for generic 2D videos to specialized algorithms explicitly designed for on-demand video streaming, user-generated content (UGC), virtual and augmented reality (VR and AR), cloud gaming, high dynamic range (HDR), and high frame rate (HFR) scenarios. Along the way, we also describe the advancement in algorithm design, beginning with traditional hand-crafted feature-based methods and finishing with current deep-learning models powering accurate VQA algorithms. We also discuss the evolution of Subjective Video Quality databases containing videos and human-annotated quality scores, which are the necessary tools to create, test, compare, and benchmark VQA algorithms. To finish, we discuss emerging trends in VQA algorithm design and general perspectives on the evolution of Video Quality Assessment in the foreseeable future
Bitstream-based video quality modeling and analysis of HTTP-based adaptive streaming
Die Verbreitung erschwinglicher Videoaufnahmetechnologie und verbesserte Internetbandbreiten ermöglichen das Streaming von hochwertigen Videos (Auflösungen > 1080p, Bildwiederholraten â„ 60fps) online. HTTP-basiertes adaptives Streaming ist die bevorzugte Methode zum Streamen von Videos, bei der Videoparameter an die verfĂŒgbare Bandbreite angepasst wird, was sich auf die VideoqualitĂ€t auswirkt. Adaptives Streaming reduziert Videowiedergabeunterbrechnungen aufgrund geringer Netzwerkbandbreite, wirken sich jedoch auf die wahrgenommene QualitĂ€t aus, weswegen eine systematische Bewertung dieser notwendig ist. Diese Bewertung erfolgt ĂŒblicherweise fĂŒr kurze Abschnitte von wenige Sekunden und wĂ€hrend einer Sitzung (bis zu mehreren Minuten). Diese Arbeit untersucht beide Aspekte mithilfe perzeptiver und instrumenteller Methoden. Die perzeptive Bewertung der kurzfristigen VideoqualitĂ€t umfasst eine Reihe von Labortests, die in frei verfĂŒgbaren DatensĂ€tzen publiziert wurden. Die QualitĂ€t von lĂ€ngeren Sitzungen wurde in Labortests mit menschlichen Betrachtern bewertet, die reale Betrachtungsszenarien simulieren. Die Methodik wurde zusĂ€tzlich auĂerhalb des Labors fĂŒr die Bewertung der kurzfristigen VideoqualitĂ€t und der GesamtqualitĂ€t untersucht, um alternative AnsĂ€tze fĂŒr die perzeptive QualitĂ€tsbewertung zu erforschen. Die instrumentelle QualitĂ€tsevaluierung wurde anhand von bitstrom- und hybriden pixelbasierten VideoqualitĂ€tsmodellen durchgefĂŒhrt, die im Zuge dieser Arbeit entwickelt wurden. Dazu wurde die Modellreihe AVQBits entwickelt, die auf den Labortestergebnissen basieren. Es wurden vier verschiedene Modellvarianten von AVQBits mit verschiedenen Inputinformationen erstellt: Mode 3, Mode 1, Mode 0 und Hybrid Mode 0. Die Modellvarianten wurden untersucht und schneiden besser oder gleichwertig zu anderen aktuellen Modellen ab. Diese Modelle wurden auch auf 360°- und Gaming-Videos, HFR-Inhalte und Bilder angewendet. DarĂŒber hinaus wird ein Langzeitintegrationsmodell (1 - 5 Minuten) auf der Grundlage des ITU-T-P.1203.3-Modells prĂ€sentiert, das die verschiedenen Varianten von AVQBits mit sekĂŒndigen QualitĂ€tswerten als VideoqualitĂ€tskomponente des vorgeschlagenen Langzeitintegrationsmodells verwendet. Alle AVQBits-Varianten, das Langzeitintegrationsmodul und die perzeptiven Testdaten wurden frei zugĂ€nglich gemacht, um weitere Forschung zu ermöglichen.The pervasion of affordable capture technology and increased internet bandwidth allows high-quality videos (resolutions > 1080p, framerates â„ 60fps) to be streamed online. HTTP-based adaptive streaming is the preferred method for streaming videos, adjusting video quality based on available bandwidth. Although adaptive streaming reduces the occurrences of video playout being stopped (called âstallingâ) due to narrow network bandwidth, the automatic adaptation has an impact on the quality perceived by the user, which results in the need to systematically assess the perceived quality. Such an evaluation is usually done on a short-term (few seconds) and overall session basis (up to several minutes). In this thesis, both these aspects are assessed using subjective and instrumental methods. The subjective assessment of short-term video quality consists of a series of lab-based video quality tests that have resulted in publicly available datasets. The overall integral quality was subjectively assessed in lab tests with human viewers mimicking a real-life viewing scenario. In addition to the lab tests, the out-of-the-lab test method was investigated for both short-term video quality and overall session quality assessment to explore the possibility of alternative approaches for subjective quality assessment. The instrumental method of quality evaluation was addressed in terms of bitstream- and hybrid pixel-based video quality models developed as part of this thesis. For this, a family of models, namely AVQBits has been conceived using the results of the lab tests as ground truth. Based on the available input information, four different instances of AVQBits, that is, a Mode 3, a Mode 1, a Mode 0, and a Hybrid Mode 0 model are presented. The model instances have been evaluated and they perform better or on par with other state-of-the-art models. These models have further been applied to 360° and gaming videos, HFR content, and images. Also, a long-term integration (1 - 5 mins) model based on the ITU-T P.1203.3 model is presented. In this work, the different instances of AVQBits with the per-1-sec scores output are employed as the video quality component of the proposed long-term integration model. All AVQBits variants as well as the long-term integration module and the subjective test data are made publicly available for further research
Blind Video Quality Assessment at the Edge
Owing to the proliferation of user-generated videos on the Internet, blind
video quality assessment (BVQA) at the edge attracts growing attention. The
usage of deep-learning-based methods is restricted by their large model sizes
and high computational complexity. In light of this, a novel lightweight BVQA
method called GreenBVQA is proposed in this work. GreenBVQA features a small
model size, low computational complexity, and high performance. Its processing
pipeline includes: video data cropping, unsupervised representation generation,
supervised feature selection, and mean-opinion-score (MOS) regression and
ensembles. We conduct experimental evaluations on three BVQA datasets and show
that GreenBVQA can offer state-of-the-art performance in PLCC and SROCC metrics
while demanding significantly smaller model sizes and lower computational
complexity. Thus, GreenBVQA is well-suited for edge devices
NTIRE 2023 Quality Assessment of Video Enhancement Challenge
This paper reports on the NTIRE 2023 Quality Assessment of Video Enhancement Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2023. This challenge is to address a major challenge in the field of video processing, namely, video quality assessment (VQA) for enhanced videos. The challenge uses the VQA Dataset for Perceptual Video Enhancement (VDPVE), which has a total of 1211 enhanced videos, including 600 videos with color, brightness, and contrast enhancements, 310 videos with deblurring, and 301 deshaked videos. The challenge has a total of 167 registered participants. 61 participating teams submitted their prediction results during the development phase, with a total of 3168 submissions. A total of 176 submissions were submitted by 37 participating teams during the final testing phase. Finally, 19 participating teams submitted their models and fact sheets, and detailed the methods they used. Some methods have achieved better results than baseline methods, and the winning methods have demonstrated superior prediction performance
CLiF-VQA: Enhancing Video Quality Assessment by Incorporating High-Level Semantic Information related to Human Feelings
Video Quality Assessment (VQA) aims to simulate the process of perceiving
video quality by the human visual system (HVS). The judgments made by HVS are
always influenced by human subjective feelings. However, most of the current
VQA research focuses on capturing various distortions in the spatial and
temporal domains of videos, while ignoring the impact of human feelings. In
this paper, we propose CLiF-VQA, which considers both features related to human
feelings and spatial features of videos. In order to effectively extract
features related to human feelings from videos, we explore the consistency
between CLIP and human feelings in video perception for the first time.
Specifically, we design multiple objective and subjective descriptions closely
related to human feelings as prompts. Further we propose a novel CLIP-based
semantic feature extractor (SFE) which extracts features related to human
feelings by sliding over multiple regions of the video frame. In addition, we
further capture the low-level-aware features of the video through a spatial
feature extraction module. The two different features are then aggregated
thereby obtaining the quality score of the video. Extensive experiments show
that the proposed CLiF-VQA exhibits excellent performance on several VQA
datasets
MD-VQA: Multi-Dimensional Quality Assessment for UGC Live Videos
User-generated content (UGC) live videos are often bothered by various
distortions during capture procedures and thus exhibit diverse visual
qualities. Such source videos are further compressed and transcoded by media
server providers before being distributed to end-users. Because of the
flourishing of UGC live videos, effective video quality assessment (VQA) tools
are needed to monitor and perceptually optimize live streaming videos in the
distributing process. In this paper, we address \textbf{UGC Live VQA} problems
by constructing a first-of-a-kind subjective UGC Live VQA database and
developing an effective evaluation tool. Concretely, 418 source UGC videos are
collected in real live streaming scenarios and 3,762 compressed ones at
different bit rates are generated for the subsequent subjective VQA
experiments. Based on the built database, we develop a
\underline{M}ulti-\underline{D}imensional \underline{VQA} (\textbf{MD-VQA})
evaluator to measure the visual quality of UGC live videos from semantic,
distortion, and motion aspects respectively. Extensive experimental results
show that MD-VQA achieves state-of-the-art performance on both our UGC Live VQA
database and existing compressed UGC VQA databases.Comment: Accepted to CVPR202
DCVQE: A Hierarchical Transformer for Video Quality Assessment
The explosion of user-generated videos stimulates a great demand for
no-reference video quality assessment (NR-VQA). Inspired by our observation on
the actions of human annotation, we put forward a Divide and Conquer Video
Quality Estimator (DCVQE) for NR-VQA. Starting from extracting the frame-level
quality embeddings (QE), our proposal splits the whole sequence into a number
of clips and applies Transformers to learn the clip-level QE and update the
frame-level QE simultaneously; another Transformer is introduced to combine the
clip-level QE to generate the video-level QE. We call this hierarchical
combination of Transformers as a Divide and Conquer Transformer (DCTr) layer.
An accurate video quality feature extraction can be achieved by repeating the
process of this DCTr layer several times. Taking the order relationship among
the annotated data into account, we also propose a novel correlation loss term
for model training. Experiments on various datasets confirm the effectiveness
and robustness of our DCVQE model.Comment: Accepted by ACCV202
Study of subjective and objective quality assessment of night-time videos
With the widespread usage of video capture devices and social media videos, videos are dominating the multimedia landscape. There is an emerging need for video quality assessment (VQA) that forms the backbone of advanced video systems. Night-time videos play an important role in user capturing, hence being able to accurately assess their quality is critical. However, the characteristics of night-time videos differ from those of general in-capture videos; and VQA algorithms that have been developed for general-purpose videos cannot accurately assess the quality of night-time videos. Research is needed to gain a better understanding of how humans perceive the quality of night-time videos, and use this new understanding to develop reliable VQA algorithms. To this end, we construct a large-scale night-time VQA database, namely Mobile In-capture Night-time Database for Video Quality (MIND-VQ), containing 1181 night-time videos, 435 subjects, and over 130000 opinion scores. We perform thorough analyses to reveal subjective quality assessment behaviors of night-time videos. Furthermore, we propose a new VQA model, namely Visibility-based Night-time Video Quality Assessment Network, VINIA. Spatial and temporal visibility-aware components are characterized to reflect properties of human perception of night-time VQA task. A series of experiments are conducted to compare our VINIA with other existing IQA/VQA algorithms using our new MIND-VQ database and other public VQA databases. Experimental results show that our subjective VQA database provides new insights and our new VINIA model achieves superior performance in accessing night-time video quality
HVS Revisited: A Comprehensive Video Quality Assessment Framework
Video quality is a primary concern for video service providers. In recent
years, the techniques of video quality assessment (VQA) based on deep
convolutional neural networks (CNNs) have been developed rapidly. Although
existing works attempt to introduce the knowledge of the human visual system
(HVS) into VQA, there still exhibit limitations that prevent the full
exploitation of HVS, including an incomplete model by few characteristics and
insufficient connections among these characteristics. To overcome these
limitations, this paper revisits HVS with five representative characteristics,
and further reorganizes their connections. Based on the revisited HVS, a
no-reference VQA framework called HVS-5M (NRVQA framework with five modules
simulating HVS with five characteristics) is proposed. It works in a
domain-fusion design paradigm with advanced network structures. On the side of
the spatial domain, the visual saliency module applies SAMNet to obtain a
saliency map. And then, the content-dependency and the edge masking modules
respectively utilize ConvNeXt to extract the spatial features, which have been
attentively weighted by the saliency map for the purpose of highlighting those
regions that human beings may be interested in. On the other side of the
temporal domain, to supplement the static spatial features, the motion
perception module utilizes SlowFast to obtain the dynamic temporal features.
Besides, the temporal hysteresis module applies TempHyst to simulate the memory
mechanism of human beings, and comprehensively evaluates the quality score
according to the fusion features from the spatial and temporal domains.
Extensive experiments show that our HVS-5M outperforms the state-of-the-art VQA
methods. Ablation studies are further conducted to verify the effectiveness of
each module towards the proposed framework.Comment: 13 pages, 5 figures, Journal pape
Recommended from our members
Statistical and perceptual properties of images and videos with applications
The visual brain is optimally designed to process images from the natural environment that we perceive. Describing the natural environment statistically helps in understanding how the brain encodes those images efficiently. The Natural Scene Statistics (NSS) of the luminance component of images is the basis of several univariate statistical models. Such models were the fundamental building blocks of multiple visual applications, ranging from the design of faithful image and video quality models to the development of perceptually optimized image enhancing techniques. Towards advancing this area, I studied the bivariate statistical properties of images and developed the first of its kind closed-form model that describes the correlation of spatially separated bandpass image samples. I found that the model was useful in tackling different problems such as blindly assessing the quality of images and assessing 3D visual discomfort of stereo images. Provided the success of NSS in tackling image processing problems, I decided to use them as a tool to tackle the blind video quality assessment (VQA) problem. First, I constructed a video quality database, the LIVE Video Quality Challenge Database (LIVE-VQC). This database is the largest across several key dimensions: number of unique contents, distortions, devices, resolutions, and videographers. For collecting the subjective scores, I constructed a new framework in Amazon Mechanical Turk. A massive number of subjects from across the globe participated in my study. Those efforts resulted in a VQA database that serves as a great benchmark for real-world videos. Next, I studied the spatio-temporal statistics of a wide variety of natural videos and created a space-time completely blind VQA model that deploys a directional temporal NSS model to predict quality. My newly created model outperforms all previous completely blind VQA models on the LIVE-VQCElectrical and Computer Engineerin