4 research outputs found

    Bitrate Ladder Prediction Methods for Adaptive Video Streaming: A Review and Benchmark

    Full text link
    HTTP adaptive streaming (HAS) has emerged as a widely adopted approach for over-the-top (OTT) video streaming services, due to its ability to deliver a seamless streaming experience. A key component of HAS is the bitrate ladder, which provides the encoding parameters (e.g., bitrate-resolution pairs) to encode the source video. The representations in the bitrate ladder allow the client's player to dynamically adjust the quality of the video stream based on network conditions by selecting the most appropriate representation from the bitrate ladder. The most straightforward and lowest complexity approach involves using a fixed bitrate ladder for all videos, consisting of pre-determined bitrate-resolution pairs known as one-size-fits-all. Conversely, the most reliable technique relies on intensively encoding all resolutions over a wide range of bitrates to build the convex hull, thereby optimizing the bitrate ladder for each specific video. Several techniques have been proposed to predict content-based ladders without performing a costly exhaustive search encoding. This paper provides a comprehensive review of various methods, including both conventional and learning-based approaches. Furthermore, we conduct a benchmark study focusing exclusively on various learning-based approaches for predicting content-optimized bitrate ladders across multiple codec settings. The considered methods are evaluated on our proposed large-scale dataset, which includes 300 UHD video shots encoded with software and hardware encoders using three state-of-the-art encoders, including AVC/H.264, HEVC/H.265, and VVC/H.266, at various bitrate points. Our analysis provides baseline methods and insights, which will be valuable for future research in the field of bitrate ladder prediction. The source code of the proposed benchmark and the dataset will be made publicly available upon acceptance of the paper

    Efficient Per-Shot Transformer-Based Bitrate Ladder Prediction for Adaptive Video Streaming

    No full text
    International audienceRecently, HTTP adaptive streaming (HAS) has become a standard approach for over-the-top (OTT)-based video streaming services due to its ability to provide smooth streaming. In HAS, stream representations are encoded to target a specific bitrate providing a wide range of operating bitrates known as the bitrate ladder. In the past, a fixed bitrate ladder approach for all videos has been widely used. However, such a method does not consider video content, which can vary considerably in motion, texture, and scene complexity. Moreover, building a per-title bitrate ladder based on an exhaustive encoding is quite expensive due to the large encoding parameter space. Thus, alternative solutions allowing accurate and efficient per-title bitrate ladder prediction are in great demand. On the other hand, self-attention-based architectures have achieved tremendous performance in large language models (LLMs) and particularly vision transformers (ViTs) in computer vision tasks. Therefore, this paper investigates ViT’s capabilities in building an efficient bitrate ladder without performing any encoding process. We provide the first in-depth analysis of the prediction accuracy and the complexity overhead induced by the ViTs model in predicting the bitrate ladder on a large and diverse video dataset. The source code of the proposed solution and the dataset will be made publicly available

    Benchmarking Learning-based Bitrate Ladder Prediction Methods for Adaptive Video Streaming

    No full text
    International audienceHTTP adaptive streaming (HAS) is increasingly adopted by over-the-top (OTT)-based video streaming services, it allows clients to dynamically switch among various stream representations. Each of these representations is encoded to target a specific bitrate providing a wide range of operating bitrates known as the bitrate ladder. Several approaches with different levels of complexity are currently used to build such a bitrate ladder. The most straightforward method is to use a fixed bitrate ladder for all videos, which is a set of bitrate-resolution pairs, called "one-size-fits-all", and the most complex is based on the intensive encoding of all resolutions over a wide bitrate range to construct the convex-hull. This latter is then used to obtain a per-title bitrate ladder. Recently, various methods relying on machine learning (ML) techniques have been proposed to predict content-based ladder without performing exhaustive search encoding. In this paper, we conduct a benchmark study of several handcrafted- and deep learning (DL)-based approaches for predicting content-optimized bitrate ladder, which we believe provides baseline methods and will be useful for future research in this field. The obtained results, based on 200 video sequences compressed with the high-efficiency video coding (HEVC) encoder, reveal that the most efficient method predicts the bitrate ladder without performing any encoding process at the cost of a slight Bjontegaard delta bitrate (BD-BR) loss of 1.43% compared to the exhaustive approach. The dataset and the source code of the considered methods are made publicly available at: https://github.com/atelili/Bitrate-Ladder-Benchmark

    2BiVQA: Double Bi-LSTM based Video Quality Assessment of UGC Videos

    No full text
    International audienceRecently, with the growing popularity of mobile devices as well as video sharing platforms (e.g., YouTube, Facebook, TikTok, and Twitch), User-Generated Content (UGC) videos have become increasingly common and now account for a large portion of multimedia traffic on the internet. Unlike professionally generated videos produced by filmmakers and videographers, typically, UGC videos contain multiple authentic distortions, generally introduced during capture and processing by naive users. Quality prediction of UGC videos is of paramount importance to optimize and monitor their processing in hosting platforms, such as their coding, transcoding, and streaming. However, blind quality prediction of UGC is quite challenging, because the degradations of UGC videos are unknown and very diverse, in addition to the unavailability of pristine reference. Therefore, in this article, we propose an accurate and efficient Blind Video Quality Assessment (BVQA) model for UGC videos, which we name 2BiVQA for double Bi-LSTM Video Quality Assessment. 2BiVQA metric consists of three main blocks, including a pre-trained Convolutional Neural Network to extract discriminative features from image patches, which are then fed into two Recurrent Neural Networks for spatial and temporal pooling. Specifically, we use two Bi-directional Long Short-term Memory networks, the first is used to capture short-range dependencies between image patches, while the second allows capturing long-range dependencies between frames to account for the temporal memory effect. Experimental results on recent large-scale UGC VQA datasets show that 2BiVQA achieves high performance at lower computational cost than most state-of-the-art VQA models. The source code of our 2BiVQA metric is made publicly available at https://github.com/atelili/2BiVQA
    corecore