32 research outputs found

    Efficient HEVC-based video adaptation using transcoding

    Get PDF
    In a video transmission system, it is important to take into account the great diversity of the network/end-user constraints. On the one hand, video content is typically streamed over a network that is characterized by different bandwidth capacities. In many cases, the bandwidth is insufficient to transfer the video at its original quality. On the other hand, a single video is often played by multiple devices like PCs, laptops, and cell phones. Obviously, a single video would not satisfy their different constraints. These diversities of the network and devices capacity lead to the need for video adaptation techniques, e.g., a reduction of the bit rate or spatial resolution. Video transcoding, which modifies a property of the video without the change of the coding format, has been well-known as an efficient adaptation solution. However, this approach comes along with a high computational complexity, resulting in huge energy consumption in the network and possibly network latency. This presentation provides several optimization strategies for the transcoding process of HEVC (the latest High Efficiency Video Coding standard) video streams. First, the computational complexity of a bit rate transcoder (transrater) is reduced. We proposed several techniques to speed-up the encoder of a transrater, notably a machine-learning-based approach and a novel coding-mode evaluation strategy have been proposed. Moreover, the motion estimation process of the encoder has been optimized with the use of decision theory and the proposed fast search patterns. Second, the issues and challenges of a spatial transcoder have been solved by using machine-learning algorithms. Thanks to their great performance, the proposed techniques are expected to significantly help HEVC gain popularity in a wide range of modern multimedia applications

    Low complexity in-loop perceptual video coding

    Get PDF
    The tradition of broadcast video is today complemented with user generated content, as portable devices support video coding. Similarly, computing is becoming ubiquitous, where Internet of Things (IoT) incorporate heterogeneous networks to communicate with personal and/or infrastructure devices. Irrespective, the emphasises is on bandwidth and processor efficiencies, meaning increasing the signalling options in video encoding. Consequently, assessment for pixel differences applies uniform cost to be processor efficient, in contrast the Human Visual System (HVS) has non-uniform sensitivity based upon lighting, edges and textures. Existing perceptual assessments, are natively incompatible and processor demanding, making perceptual video coding (PVC) unsuitable for these environments. This research allows existing perceptual assessment at the native level using low complexity techniques, before producing new pixel-base image quality assessments (IQAs). To manage these IQAs a framework was developed and implemented in the high efficiency video coding (HEVC) encoder. This resulted in bit-redistribution, where greater bits and smaller partitioning were allocated to perceptually significant regions. Using a HEVC optimised processor the timing increase was < +4% and < +6% for video streaming and recording applications respectively, 1/3 of an existing low complexity PVC solution. Future work should be directed towards perceptual quantisation which offers the potential for perceptual coding gain

    Bitrate Ladder Prediction Methods for Adaptive Video Streaming: A Review and Benchmark

    Full text link
    HTTP adaptive streaming (HAS) has emerged as a widely adopted approach for over-the-top (OTT) video streaming services, due to its ability to deliver a seamless streaming experience. A key component of HAS is the bitrate ladder, which provides the encoding parameters (e.g., bitrate-resolution pairs) to encode the source video. The representations in the bitrate ladder allow the client's player to dynamically adjust the quality of the video stream based on network conditions by selecting the most appropriate representation from the bitrate ladder. The most straightforward and lowest complexity approach involves using a fixed bitrate ladder for all videos, consisting of pre-determined bitrate-resolution pairs known as one-size-fits-all. Conversely, the most reliable technique relies on intensively encoding all resolutions over a wide range of bitrates to build the convex hull, thereby optimizing the bitrate ladder for each specific video. Several techniques have been proposed to predict content-based ladders without performing a costly exhaustive search encoding. This paper provides a comprehensive review of various methods, including both conventional and learning-based approaches. Furthermore, we conduct a benchmark study focusing exclusively on various learning-based approaches for predicting content-optimized bitrate ladders across multiple codec settings. The considered methods are evaluated on our proposed large-scale dataset, which includes 300 UHD video shots encoded with software and hardware encoders using three state-of-the-art encoders, including AVC/H.264, HEVC/H.265, and VVC/H.266, at various bitrate points. Our analysis provides baseline methods and insights, which will be valuable for future research in the field of bitrate ladder prediction. The source code of the proposed benchmark and the dataset will be made publicly available upon acceptance of the paper

    Saliency-Enabled Coding Unit Partitioning and Quantization Control for Versatile Video Coding

    Get PDF
    The latest video coding standard, versatile video coding (VVC), has greatly improved coding efficiency over its predecessor standard high efficiency video coding (HEVC), but at the expense of sharply increased complexity. In the context of perceptual video coding (PVC), the visual saliency model that utilizes the characteristics of the human visual system to improve coding efficiency has become a reliable method due to advances in computer performance and visual algorithms. In this paper, a novel VVC optimization scheme compliant PVC framework is proposed, which consists of fast coding unit (CU) partition algorithm and quantization control algorithm. Firstly, based on the visual saliency model, we proposed a fast CU division scheme, including the redetermination of the CU division depth by calculating Scharr operator and variance, as well as the executive decision for intra sub-partitions (ISP), to reduce the coding complexity. Secondly, a quantization control algorithm is proposed by adjusting the quantization parameter based on multi-level classification of saliency values at the CU level to reduce the bitrate. In comparison with the reference model, experimental results indicate that the proposed method can reduce about 47.19% computational complexity and achieve a bitrate saving of 3.68% on average. Meanwhile, the proposed algorithm has reasonable peak signal-to-noise ratio losses and nearly the same subjective perceptual quality

    Fast and Efficient Foveated Video Compression Schemes for H.264/AVC Platform

    Get PDF
    Some fast and efficient foveated video compression schemes for H.264/AVC platform are presented in this dissertation. The exponential growth in networking technologies and widespread use of video content based multimedia information over internet for mass communication applications like social networking, e-commerce and education have promoted the development of video coding to a great extent. Recently, foveated imaging based image or video compression schemes are in high demand, as they not only match with the perception of human visual system (HVS), but also yield higher compression ratio. The important or salient regions are compressed with higher visual quality while the non-salient regions are compressed with higher compression ratio. From amongst the foveated video compression developments during the last few years, it is observed that saliency detection based foveated schemes are the keen areas of intense research. Keeping this in mind, we propose two multi-scale saliency detection schemes. (1) Multi-scale phase spectrum based saliency detection (FTPBSD); (2) Sign-DCT multi-scale pseudo-phase spectrum based saliency detection (SDCTPBSD). In FTPBSD scheme, a saliency map is determined using phase spectrum of a given image/video with unity magnitude spectrum. On the other hand, the proposed SDCTPBSD method uses sign information of discrete cosine transform (DCT) also known as sign-DCT (SDCT). It resembles the response of receptive field neurons of HVS. A bottom-up spatio-temporal saliency map is obtained by linear weighted sum of spatial saliency map and temporal saliency map. Based on these saliency detection techniques, foveated video compression (FVC) schemes (FVC-FTPBSD and FVC-SDCTPBSD) are developed to improve the compression performance further.Moreover, the 2D-discrete cosine transform (2D-DCT) is widely used in various video coding standards for block based transformation of spatial data. However, for directional featured blocks, 2D-DCT offers sub-optimal performance and may not able to efficiently represent video data with fewer coefficients that deteriorates compression ratio. Various directional transform schemes are proposed in literature for efficiently encoding such directional featured blocks. However, it is observed that these directional transform schemes suffer from many issues like ‘mean weighting defect’, use of a large number of DCTs and a number of scanning patterns. We propose a directional transform scheme based on direction-adaptive fixed length discrete cosine transform (DAFL-DCT) for intra-, and inter-frame to achieve higher coding efficiency in case of directional featured blocks.Furthermore, the proposed DAFL-DCT has the following two encoding modes. (1) Direction-adaptive fixed length ― high efficiency (DAFL-HE) mode for higher compression performance; (2) Direction-adaptive fixed length ― low complexity (DAFL-LC) mode for low complexity with a fair compression ratio. On the other hand, motion estimation (ME) exploits temporal correlation between video frames and yields significant improvement in compression ratio while sustaining high visual quality in video coding. Block-matching motion estimation (BMME) is the most popular approach due to its simplicity and efficiency. However, the real-world video sequences may contain slow, medium and/or fast motion activities. Further, a single search pattern does not prove efficient in finding best matched block for all motion types. In addition, it is observed that most of the BMME schemes are based on uni-modal error surface. Nevertheless, real-world video sequences may exhibit a large number of local minima available within a search window and thus possess multi-modal error surface (MES). Hence, the following two uni-modal error surface based and multi-modal error surface based motion estimation schemes are developed. (1) Direction-adaptive motion estimation (DAME) scheme; (2) Pattern-based modified particle swarm optimization motion estimation (PMPSO-ME) scheme. Subsequently, various fast and efficient foveated video compression schemes are developed with combination of these schemes to improve the video coding performance further while maintaining high visual quality to salient regions. All schemes are incorporated into the H.264/AVC video coding platform. Various experiments have been carried out on H.264/AVC joint model reference software (version JM 18.6). Computing various benchmark metrics, the proposed schemes are compared with other existing competitive schemes in terms of rate-distortion curves, Bjontegaard metrics (BD-PSNR, BD-SSIM and BD-bitrate), encoding time, number of search points and subjective evaluation to derive an overall conclusion

    Efficient Bitrate Ladder Construction for Content-Optimized Adaptive Video Streaming

    Get PDF
    One of the challenges faced by many video providers is the heterogeneity of network specifications, user requirements, and content compression performance. The universal solution of a fixed bitrate ladder is inadequate in ensuring a high quality of user experience without re-buffering or introducing annoying compression artifacts. However, a content-tailored solution, based on extensively encoding across all resolutions and over a wide quality range is highly expensive in terms of computational, financial, and energy costs. Inspired by this, we propose an approach that exploits machine learning to predict a content-optimized bitrate ladder. The method extracts spatio-temporal features from the uncompressed content, trains machine-learning models to predict the Pareto front parameters, and, based on that, builds the ladder within a defined bitrate range. The method has the benefit of significantly reducing the number of encodes required per sequence. The presented results, based on 100 HEVC-encoded sequences, demonstrate a reduction in the number of encodes required when compared to an exhaustive search and an interpolation-based method, by 89.06% and 61.46%, respectively, at the cost of an average Bj{\o}ntegaard Delta Rate difference of 1.78% compared to the exhaustive approach. Finally, a hybrid method is introduced that selects either the proposed or the interpolation-based method depending on the sequence features. This results in an overall 83.83% reduction of required encodings at the cost of an average Bj{\o}ntegaard Delta Rate difference of 1.26%
    corecore