2,093 research outputs found
Machine Learning based Efficient QT-MTT Partitioning Scheme for VVC Intra Encoders
The next-generation Versatile Video Coding (VVC) standard introduces a new
Multi-Type Tree (MTT) block partitioning structure that supports Binary-Tree
(BT) and Ternary-Tree (TT) splits in both vertical and horizontal directions.
This new approach leads to five possible splits at each block depth and thereby
improves the coding efficiency of VVC over that of the preceding High
Efficiency Video Coding (HEVC) standard, which only supports Quad-Tree (QT)
partitioning with a single split per block depth. However, MTT also has brought
a considerable impact on encoder computational complexity. In this paper, a
two-stage learning-based technique is proposed to tackle the complexity
overhead of MTT in VVC intra encoders. In our scheme, the input block is first
processed by a Convolutional Neural Network (CNN) to predict its spatial
features through a vector of probabilities describing the partition at each 4x4
edge. Subsequently, a Decision Tree (DT) model leverages this vector of spatial
features to predict the most likely splits at each block. Finally, based on
this prediction, only the N most likely splits are processed by the
Rate-Distortion (RD) process of the encoder. In order to train our CNN and DT
models on a wide range of image contents, we also propose a public VVC frame
partitioning dataset based on existing image dataset encoded with the VVC
reference software encoder. Our proposal relying on the top-3 configuration
reaches 46.6% complexity reduction for a negligible bitrate increase of 0.86%.
A top-2 configuration enables a higher complexity reduction of 69.8% for 2.57%
bitrate loss. These results emphasis a better trade-off between VTM intra
coding efficiency and complexity reduction compared to the state-of-the-art
solutions
Efficient VVC Intra Prediction Based on Deep Feature Fusion and Probability Estimation
The ever-growing multimedia traffic has underscored the importance of
effective multimedia codecs. Among them, the up-to-date lossy video coding
standard, Versatile Video Coding (VVC), has been attracting attentions of video
coding community. However, the gain of VVC is achieved at the cost of
significant encoding complexity, which brings the need to realize fast encoder
with comparable Rate Distortion (RD) performance. In this paper, we propose to
optimize the VVC complexity at intra-frame prediction, with a two-stage
framework of deep feature fusion and probability estimation. At the first
stage, we employ the deep convolutional network to extract the spatialtemporal
neighboring coding features. Then we fuse all reference features obtained by
different convolutional kernels to determine an optimal intra coding depth. At
the second stage, we employ a probability-based model and the spatial-temporal
coherence to select the candidate partition modes within the optimal coding
depth. Finally, these selected depths and partitions are executed whilst
unnecessary computations are excluded. Experimental results on standard
database demonstrate the superiority of proposed method, especially for High
Definition (HD) and Ultra-HD (UHD) video sequences.Comment: 10 pages, 10 figure
Compression vidéo basée sur l'exploitation d'un décodeur intelligent
This Ph.D. thesis studies the novel concept of Smart Decoder (SDec) where the decoder is given the ability to simulate the encoder and is able to conduct the R-D competition similarly as in the encoder. The proposed technique aims to reduce the signaling of competing coding modes and parameters. The general SDec coding scheme and several practical applications are proposed, followed by a long-term approach exploiting machine learning concept in video coding. The SDec coding scheme exploits a complex decoder able to reproduce the choice of the encoder based on causal references, eliminating thus the need to signal coding modes and associated parameters. Several practical applications of the general outline of the SDec scheme are tested, using different coding modes during the competition on the reference blocs. Despite the choice for the SDec reference block being still simple and limited, interesting gains are observed. The long-term research presents an innovative method that further makes use of the processing capacity of the decoder. Machine learning techniques are exploited in video coding with the purpose of reducing the signaling overhead. Practical applications are given, using a classifier based on support vector machine to predict coding modes of a block. The block classification uses causal descriptors which consist of different types of histograms. Significant bit rate savings are obtained, which confirms the potential of the approach.Cette thèse de doctorat étudie le nouveau concept de décodeur intelligent (SDec) dans lequel le décodeur est doté de la possibilité de simuler l’encodeur et est capable de mener la compétition R-D de la même manière qu’au niveau de l’encodeur. Cette technique vise à réduire la signalisation des modes et des paramètres de codage en compétition. Le schéma général de codage SDec ainsi que plusieurs applications pratiques sont proposées, suivis d’une approche en amont qui exploite l’apprentissage automatique pour le codage vidéo. Le schéma de codage SDec exploite un décodeur complexe capable de reproduire le choix de l’encodeur calculé sur des blocs de référence causaux, éliminant ainsi la nécessité de signaler les modes de codage et les paramètres associés. Plusieurs applications pratiques du schéma SDec sont testées, en utilisant différents modes de codage lors de la compétition sur les blocs de référence. Malgré un choix encore simple et limité des blocs de référence, les gains intéressants sont observés. La recherche en amont présente une méthode innovante qui permet d’exploiter davantage la capacité de traitement d’un décodeur. Les techniques d’apprentissage automatique sont exploitées pour but de réduire la signalisation. Les applications pratiques sont données, utilisant un classificateur basé sur les machines à vecteurs de support pour prédire les modes de codage d’un bloc. La classification des blocs utilise des descripteurs causaux qui sont formés à partir de différents types d’histogrammes. Des gains significatifs en débit sont obtenus, confirmant ainsi le potentiel de l’approche
On the use of deep learning and parallelism techniques to signifcantly reduce the HEVC intra‑coding time
It is well-known that each new video coding standard signifcantly increases in computational complexity with respect to previous standards, and this is particularly true
for the HEVC and VVC video coding standards. The development of techniques for
reducing the required complexity without afecting the rate/distortion (R/D) performance is therefore always a topic of intense research interest. In this paper, we
propose a combination of two powerful techniques, deep learning and parallel computing, to signifcantly reduce the complexity of the HEVC encoding engine. Our
experimental results show that a combination of deep learning to reduce the CTU
partitioning complexity with parallel strategies based on frame partitioning is able
to achieve speedups of up to 26Ă— when 16 threads are used. The R/D penalty in
terms of the BD-BR metric depends on the video content, the compression rate and
the number of OpenMP threads, and was consistently between 0.35 and 10% for the
video sequence test set used in our experiment
Quality of Experience (QoE)-Aware Fast Coding Unit Size Selection for HEVC Intra-prediction
The exorbitant increase in the computational complexity of modern video coding standards, such as High Efficiency Video Coding (HEVC), is a compelling challenge for resource-constrained consumer electronic devices. For instance, the brute force evaluation of all possible combinations of available coding modes and quadtree-based coding structure in HEVC to determine the optimum set of coding parameters for a given content demand a substantial amount of computational and energy resources. Thus, the resource requirements for real time operation of HEVC has become a contributing factor towards the Quality of Experience (QoE) of the end users of emerging multimedia and future internet applications. In this context, this paper proposes a content-adaptive Coding Unit (CU) size selection algorithm for HEVC intra-prediction. The proposed algorithm builds content-specific weighted Support Vector Machine (SVM) models in real time during the encoding process, to provide an early estimate of CU size for a given content, avoiding the brute force evaluation of all possible coding mode combinations in HEVC. The experimental results demonstrate an average encoding time reduction of 52.38%, with an average Bjøntegaard Delta Bit Rate (BDBR) increase of 1.19% compared to the HM16.1 reference encoder. Furthermore, the perceptual visual quality assessments conducted through Video Quality Metric (VQM) show minimal visual quality impact on the reconstructed videos of the proposed algorithm compared to state-of-the-art approaches
- …