    AVQBits-adaptive video quality model based on bitstream information for various video applications

    The paper presents AVQBits, a versatile, bitstream-based video quality model. It can be applied in several contexts such as video service monitoring, evaluation of video encoding quality, of gaming video QoE, and even of omnidirectional video quality. In the paper, it is shown that AVQBits predictions closely match video quality ratings obained in various subjective tests with human viewers, for videos up to 4K-UHD resolution (Ultra-High Definition, 3840 x 2180 pixels) and framerates up 120 fps. With the different variants of AVQBits presented in the paper, video quality can be monitored either at the client side, in the network or directly after encoding. The no-reference AVQBits model was developed for different video services and types of input data, reflecting the increasing popularity of Video-on-Demand services and widespread use of HTTP-based adaptive streaming. At its core, AVQBits encompasses the standardized ITU-T P.1204.3 model, with further model instances that can either have restricted or extended input information, depending on the application context. Four different instances of AVQBits are presented, that is, a Mode 3 model with full access to the bitstream, a Mode 0 variant using only metadata such as codec type, framerate, resoution and bitrate as input, a Mode 1 model using Mode 0 information and frame-type and -size information, and a Hybrid Mode 0 model that is based on Mode 0 metadata and the decoded video pixel information. The models are trained on the authors’ own AVT-PNATS-UHD-1 dataset described in the paper. All models show a highly competitive performance by using AVT-VQDB-UHD-1 as validation dataset, e.g., with the Mode 0 variant yielding a value of 0.890 Pearson Correlation, the Mode 1 model of 0.901, the hybrid no-reference mode 0 model of 0.928 and the model with full bitstream access of 0.942. In addition, all four AVQBits variants are evaluated when applying them out-of-the-box to different media formats such as 360° video, high framerate (HFR) content, or gaming videos. The analysis shows that the ITU-T P.1204.3 and Hybrid Mode 0 instances of AVQBits for the considered use-cases either perform on par with or better than even state-of-the-art full reference, pixel-based models. Furthermore, it is shown that the proposed Mode 0 and Mode 1 variants outperform commonly used no-reference models for the different application scopes. Also, a long-term integration model based on the standardized ITU-T P.1203.3 is presented to estimate ratings of overall audiovisual streaming Quality of Experience (QoE) for sessions of 30 s up to 5 min duration. In the paper, the AVQBits instances with their per-1-sec score output are evaluated as the video quality component of the proposed long-term integration model. All AVQBits variants as well as the long-term integration module are made publicly available for the community for further research

    The CASPER user-centric approach for advanced service provisioning in mobile networks

    Abstract This paper presents an overview of the project CASPER, 1 a 4-year Marie Curie Research and Innovation Staff Exchange (RISE) project running between 2016 and 2020, describing its objectives, approach, architecture, tools and key achievements. CASPER combines academic and industrial forces towards leveraging the expected benefits of Quality of Experience (QoE) exploitation in future networks. In order to achieve that, a QoE orchestrator has been proposed which implements the basic functionalities of QoE monitoring, estimation and management. With means of simulation and testbed emulation, CASPER has managed to develop a proprietary SDN Controller, which implements QoE-based traffic rerouting for the challenging scenario of HTTP adaptive video streaming, leading to more stable and higher QoE scores compared to a state-of-the-art SDN Controller implementation

    Power Reduction Opportunities on End-User Devices in Quality-Steady Video Streaming

    This paper uses a crowdsourced dataset of online video streaming sessions to investigate opportunities to reduce the power consumption while considering QoE. For this, we base our work on prior studies which model both the end-user's QoE and the end-user device's power consumption with the help of high-level video features such as the bitrate, the frame rate, and the resolution. On top of existing research, which focused on reducing the power consumption at the same QoE optimizing video parameters, we investigate potential power savings by other means such as using a different playback device, a different codec, or a predefined maximum quality level. We find that based on the power consumption of the streaming sessions from the crowdsourcing dataset, devices could save more than 55% of power if all participants adhere to low-power settings.Comment: 4 pages, 3 figure

    Bitstream-based video quality modeling and analysis of HTTP-based adaptive streaming

    Die Verbreitung erschwinglicher Videoaufnahmetechnologie und verbesserte Internetbandbreiten ermöglichen das Streaming von hochwertigen Videos (Auflösungen > 1080p, Bildwiederholraten ≄ 60fps) online. HTTP-basiertes adaptives Streaming ist die bevorzugte Methode zum Streamen von Videos, bei der Videoparameter an die verfĂŒgbare Bandbreite angepasst wird, was sich auf die VideoqualitĂ€t auswirkt. Adaptives Streaming reduziert Videowiedergabeunterbrechnungen aufgrund geringer Netzwerkbandbreite, wirken sich jedoch auf die wahrgenommene QualitĂ€t aus, weswegen eine systematische Bewertung dieser notwendig ist. Diese Bewertung erfolgt ĂŒblicherweise fĂŒr kurze Abschnitte von wenige Sekunden und wĂ€hrend einer Sitzung (bis zu mehreren Minuten). Diese Arbeit untersucht beide Aspekte mithilfe perzeptiver und instrumenteller Methoden. Die perzeptive Bewertung der kurzfristigen VideoqualitĂ€t umfasst eine Reihe von Labortests, die in frei verfĂŒgbaren DatensĂ€tzen publiziert wurden. Die QualitĂ€t von lĂ€ngeren Sitzungen wurde in Labortests mit menschlichen Betrachtern bewertet, die reale Betrachtungsszenarien simulieren. Die Methodik wurde zusĂ€tzlich außerhalb des Labors fĂŒr die Bewertung der kurzfristigen VideoqualitĂ€t und der GesamtqualitĂ€t untersucht, um alternative AnsĂ€tze fĂŒr die perzeptive QualitĂ€tsbewertung zu erforschen. Die instrumentelle QualitĂ€tsevaluierung wurde anhand von bitstrom- und hybriden pixelbasierten VideoqualitĂ€tsmodellen durchgefĂŒhrt, die im Zuge dieser Arbeit entwickelt wurden. Dazu wurde die Modellreihe AVQBits entwickelt, die auf den Labortestergebnissen basieren. Es wurden vier verschiedene Modellvarianten von AVQBits mit verschiedenen Inputinformationen erstellt: Mode 3, Mode 1, Mode 0 und Hybrid Mode 0. Die Modellvarianten wurden untersucht und schneiden besser oder gleichwertig zu anderen aktuellen Modellen ab. Diese Modelle wurden auch auf 360°- und Gaming-Videos, HFR-Inhalte und Bilder angewendet. DarĂŒber hinaus wird ein Langzeitintegrationsmodell (1 - 5 Minuten) auf der Grundlage des ITU-T-P.1203.3-Modells prĂ€sentiert, das die verschiedenen Varianten von AVQBits mit sekĂŒndigen QualitĂ€tswerten als VideoqualitĂ€tskomponente des vorgeschlagenen Langzeitintegrationsmodells verwendet. Alle AVQBits-Varianten, das Langzeitintegrationsmodul und die perzeptiven Testdaten wurden frei zugĂ€nglich gemacht, um weitere Forschung zu ermöglichen.The pervasion of affordable capture technology and increased internet bandwidth allows high-quality videos (resolutions > 1080p, framerates ≄ 60fps) to be streamed online. HTTP-based adaptive streaming is the preferred method for streaming videos, adjusting video quality based on available bandwidth. Although adaptive streaming reduces the occurrences of video playout being stopped (called “stalling”) due to narrow network bandwidth, the automatic adaptation has an impact on the quality perceived by the user, which results in the need to systematically assess the perceived quality. Such an evaluation is usually done on a short-term (few seconds) and overall session basis (up to several minutes). In this thesis, both these aspects are assessed using subjective and instrumental methods. The subjective assessment of short-term video quality consists of a series of lab-based video quality tests that have resulted in publicly available datasets. The overall integral quality was subjectively assessed in lab tests with human viewers mimicking a real-life viewing scenario. In addition to the lab tests, the out-of-the-lab test method was investigated for both short-term video quality and overall session quality assessment to explore the possibility of alternative approaches for subjective quality assessment. The instrumental method of quality evaluation was addressed in terms of bitstream- and hybrid pixel-based video quality models developed as part of this thesis. For this, a family of models, namely AVQBits has been conceived using the results of the lab tests as ground truth. Based on the available input information, four different instances of AVQBits, that is, a Mode 3, a Mode 1, a Mode 0, and a Hybrid Mode 0 model are presented. The model instances have been evaluated and they perform better or on par with other state-of-the-art models. These models have further been applied to 360° and gaming videos, HFR content, and images. Also, a long-term integration (1 - 5 mins) model based on the ITU-T P.1203.3 model is presented. In this work, the different instances of AVQBits with the per-1-sec scores output are employed as the video quality component of the proposed long-term integration model. All AVQBits variants as well as the long-term integration module and the subjective test data are made publicly available for further research

    Data-driven visual quality estimation using machine learning

    Heutzutage werden viele visuelle Inhalte erstellt und sind zugĂ€nglich, was auf Verbesserungen der Technologie wie Smartphones und das Internet zurĂŒckzufĂŒhren ist. Es ist daher notwendig, die von den Nutzern wahrgenommene QualitĂ€t zu bewerten, um das Erlebnis weiter zu verbessern. Allerdings sind nur wenige der aktuellen QualitĂ€tsmodelle speziell fĂŒr höhere Auflösungen konzipiert, sagen mehr als nur den Mean Opinion Score vorher oder nutzen maschinelles Lernen. Ein Ziel dieser Arbeit ist es, solche maschinellen Modelle fĂŒr höhere Auflösungen mit verschiedenen DatensĂ€tzen zu trainieren und zu evaluieren. Als Erstes wird eine objektive Analyse der BildqualitĂ€t bei höheren Auflösungen durchgefĂŒhrt. Die Bilder wurden mit Video-Encodern komprimiert, hierbei weist AV1 die beste QualitĂ€t und Kompression auf. Anschließend werden die Ergebnisse eines Crowd-Sourcing-Tests mit einem Labortest bezĂŒglich BildqualitĂ€t verglichen. Weiterhin werden auf Deep Learning basierende Modelle fĂŒr die Vorhersage von Bild- und VideoqualitĂ€t beschrieben. Das auf Deep Learning basierende Modell ist aufgrund der benötigten Ressourcen fĂŒr die Vorhersage der VideoqualitĂ€t in der Praxis nicht anwendbar. Aus diesem Grund werden pixelbasierte VideoqualitĂ€tsmodelle vorgeschlagen und ausgewertet, die aussagekrĂ€ftige Features verwenden, welche Bild- und Bewegungsaspekte abdecken. Diese Modelle können zur Vorhersage von Mean Opinion Scores fĂŒr Videos oder sogar fĂŒr anderer Werte im Zusammenhang mit der VideoqualitĂ€t verwendet werden, wie z.B. einer Bewertungsverteilung. Die vorgestellte Modellarchitektur kann auf andere Videoprobleme angewandt werden, wie z.B. Videoklassifizierung, Vorhersage der QualitĂ€t von Spielevideos, Klassifikation von Spielegenres oder der Klassifikation von Kodierungsparametern. Ein wichtiger Aspekt ist auch die Verarbeitungszeit solcher Modelle. Daher wird ein allgemeiner Ansatz zur Beschleunigung von State-of-the-Art-VideoqualitĂ€tsmodellen vorgestellt, der zeigt, dass ein erheblicher Teil der Verarbeitungszeit eingespart werden kann, wĂ€hrend eine Ă€hnliche Vorhersagegenauigkeit erhalten bleibt. Die Modelle sind als Open Source veröffentlicht, so dass die entwickelten Frameworks fĂŒr weitere Forschungsarbeiten genutzt werden können. Außerdem können die vorgestellten AnsĂ€tze als Bausteine fĂŒr neuere Medienformate verwendet werden.Today a lot of visual content is accessible and produced, due to improvements in technology such as smartphones and the internet. This results in a need to assess the quality perceived by users to further improve the experience. However, only a few of the state-of-the-art quality models are specifically designed for higher resolutions, predict more than mean opinion score, or use machine learning. One goal of the thesis is to train and evaluate such machine learning models of higher resolutions with several datasets. At first, an objective evaluation of image quality in case of higher resolutions is performed. The images are compressed using video encoders, and it is shown that AV1 is best considering quality and compression. This evaluation is followed by the analysis of a crowdsourcing test in comparison with a lab test investigating image quality. Afterward, deep learning-based models for image quality prediction and an extension for video quality are proposed. However, the deep learning-based video quality model is not practically usable because of performance constrains. For this reason, pixel-based video quality models using well-motivated features covering image and motion aspects are proposed and evaluated. These models can be used to predict mean opinion scores for videos, or even to predict other video quality-related information, such as a rating distributions. The introduced model architecture can be applied to other video problems, such as video classification, gaming video quality prediction, gaming genre classification or encoding parameter estimation. Furthermore, one important aspect is the processing time of such models. Hence, a generic approach to speed up state-of-the-art video quality models is introduced, which shows that a significant amount of processing time can be saved, while achieving similar prediction accuracy. The models have been made publicly available as open source so that the developed frameworks can be used for further research. Moreover, the presented approaches may be usable as building blocks for newer media formats

    QoE modeling for HTTP adaptive video streaming : a survey and open challenges

    Latency Target based Analysis of the DASH.js Player

    We analyse the low latency performance of the three Adaptive Bitrate (ABR) algorithms in the dash.js Dynamic Adaptive Streaming over HTTP (DASH) player with respect to a range of latency targets and configuration options. We perform experiments on our DASH Testbed which allows for testing with a range of real world derived network profiles. Our experiments enable a better understanding of how latency targets affect quality of experience (QoE), and how well the different algorithms adhere to their targets. We find that with dash.js v4.5.0 the default Dynamic algorithm achieves the best overall QoE. We show that whilst the other algorithms can achieve higher video quality at lower latencies, they do so only at the expense of increased stalling. We analyse the poor performance of L2A-LL in our tests and develop modifications which demonstrate significant improvements. We also highlight how some low latency configuration settings can be detrimental to performance.Comment: To be published in Proceedings of the 14th ACM Multimedia Systems Conference (MMSys '23), June 7-10, 2023, Vancouver, BC, Canad

    QoE Assessment for Multi-Video Object Based Media

    Recent multimedia experiences using techniques such as DASH allow the streaming delivery to be adapted to suit network context. Object Based Media (OBM) provides even more flexibility as distinct media objects are streamed and combined based on user preferences, allowing the experience to be personalised for the user. As adaptation can lead to degradation, modelling and measuring Quality of Experience (QoE) are crucial to ensure a perceptibly-optimal user experience. QoE models proposed for DASH include quality-related factors from single video-object streams and hence, are unsuitable for multi-video OBM experiences. In this paper, we propose an objective method to quantify QoE for video-based OBM experiences. Our model provides different strategies to aggregate individual object QoE contributions for different OBM experience genres. We apply our model to a case study and contrast it with the QoE levels obtained using a standard QoE model for DASH