44 research outputs found
Complexity Analysis Of Next-Generation VVC Encoding and Decoding
While the next generation video compression standard, Versatile Video Coding
(VVC), provides a superior compression efficiency, its computational complexity
dramatically increases. This paper thoroughly analyzes this complexity for both
encoder and decoder of VVC Test Model 6, by quantifying the complexity
break-down for each coding tool and measuring the complexity and memory
requirements for VVC encoding/decoding. These extensive analyses are performed
for six video sequences of 720p, 1080p, and 2160p, under Low-Delay (LD),
Random-Access (RA), and All-Intra (AI) conditions (a total of 320
encoding/decoding). Results indicate that the VVC encoder and decoder are 5x
and 1.5x more complex compared to HEVC in LD, and 31x and 1.8x in AI,
respectively. Detailed analysis of coding tools reveals that in LD on average,
motion estimation tools with 53%, transformation and quantization with 22%, and
entropy coding with 7% dominate the encoding complexity. In decoding, loop
filters with 30%, motion compensation with 20%, and entropy decoding with 16%,
are the most complex modules. Moreover, the required memory bandwidth for VVC
encoding/decoding are measured through memory profiling, which are 30x and 3x
of HEVC. The reported results and insights are a guide for future research and
implementations of energy-efficient VVC encoder/decoder.Comment: IEEE ICIP 202
Machine Learning based Efficient QT-MTT Partitioning Scheme for VVC Intra Encoders
The next-generation Versatile Video Coding (VVC) standard introduces a new
Multi-Type Tree (MTT) block partitioning structure that supports Binary-Tree
(BT) and Ternary-Tree (TT) splits in both vertical and horizontal directions.
This new approach leads to five possible splits at each block depth and thereby
improves the coding efficiency of VVC over that of the preceding High
Efficiency Video Coding (HEVC) standard, which only supports Quad-Tree (QT)
partitioning with a single split per block depth. However, MTT also has brought
a considerable impact on encoder computational complexity. In this paper, a
two-stage learning-based technique is proposed to tackle the complexity
overhead of MTT in VVC intra encoders. In our scheme, the input block is first
processed by a Convolutional Neural Network (CNN) to predict its spatial
features through a vector of probabilities describing the partition at each 4x4
edge. Subsequently, a Decision Tree (DT) model leverages this vector of spatial
features to predict the most likely splits at each block. Finally, based on
this prediction, only the N most likely splits are processed by the
Rate-Distortion (RD) process of the encoder. In order to train our CNN and DT
models on a wide range of image contents, we also propose a public VVC frame
partitioning dataset based on existing image dataset encoded with the VVC
reference software encoder. Our proposal relying on the top-3 configuration
reaches 46.6% complexity reduction for a negligible bitrate increase of 0.86%.
A top-2 configuration enables a higher complexity reduction of 69.8% for 2.57%
bitrate loss. These results emphasis a better trade-off between VTM intra
coding efficiency and complexity reduction compared to the state-of-the-art
solutions
Encoder-Decoder-Based Intra-Frame Block Partitioning Decision
The recursive intra-frame block partitioning decision process, a crucial
component of the next-generation video coding standards, exerts significant
influence over the encoding time. In this paper, we propose an encoder-decoder
neural network (NN) to accelerate this process. Specifically, a CNN is utilized
to compress the pixel data of the largest coding unit (LCU) into a fixed-length
vector. Subsequently, a Transformer decoder is employed to transcribe the
fixed-length vector into a variable-length vector, which represents the block
partitioning outcomes of the encoding LCU. The vector transcription process
adheres to the constraints imposed by the block partitioning algorithm. By
fully parallelizing the NN prediction in the intra-mode decision, substantial
time savings can be attained during the decision phase. The experimental
results obtained from high-definition (HD) sequences coding demonstrate that
this framework achieves a remarkable 87.84\% reduction in encoding time, with a
relatively small loss (8.09\%) of coding performance compared to AVS3 HPM4.0
CNN-based Prediction of Partition Path for VVC Fast Inter Partitioning Using Motion Fields
The Versatile Video Coding (VVC) standard has been recently finalized by the
Joint Video Exploration Team (JVET). Compared to the High Efficiency Video
Coding (HEVC) standard, VVC offers about 50% compression efficiency gain, in
terms of Bjontegaard Delta-Rate (BD-rate), at the cost of a 10-fold increase in
encoding complexity. In this paper, we propose a method based on Convolutional
Neural Network (CNN) to speed up the inter partitioning process in VVC.
Firstly, a novel representation for the quadtree with nested multi-type tree
(QTMT) partition is introduced, derived from the partition path. Secondly, we
develop a U-Net-based CNN taking a multi-scale motion vector field as input at
the Coding Tree Unit (CTU) level. The purpose of CNN inference is to predict
the optimal partition path during the Rate-Distortion Optimization (RDO)
process. To achieve this, we divide CTU into grids and predict the Quaternary
Tree (QT) depth and Multi-type Tree (MT) split decisions for each cell of the
grid. Thirdly, an efficient partition pruning algorithm is introduced to employ
the CNN predictions at each partitioning level to skip RDO evaluations of
unnecessary partition paths. Finally, an adaptive threshold selection scheme is
designed, making the trade-off between complexity and efficiency scalable.
Experiments show that the proposed method can achieve acceleration ranging from
16.5% to 60.2% under the RandomAccess Group Of Picture 32 (RAGOP32)
configuration with a reasonable efficiency drop ranging from 0.44% to 4.59% in
terms of BD-rate, which surpasses other state-of-the-art solutions.
Additionally, our method stands out as one of the lightest approaches in the
field, which ensures its applicability to other encoders
Design Space Exploration of Practical VVC Encoding for Emerging Media Applications
Versatile Video Coding (VVC/H.266) is the latest video coding standard designed for a broad range of next-generation media applications. This paper explores the design space of practical VVC encoding by profiling the Fraunhofer Versatile Video Encoder (VVenC). All experiments were conducted over five 2160p video sequences and their downsampled versions under the random access (RA) condition. The exploration was performed by analyzing the rate-distortion-complexity (RDC) of the VVC block structure and coding tools. First, VVenC was profiled to provide a breakdown of coding block distribution and coding tool utilization in it. Then, the usefulness of each VVC coding tool was analyzed for its individual impact on overall RDC performance. Finally, our findings were elevated to practical implementation guidelines: the highest coding gains come with the multi type tree (MTT) structure, adaptive loop filter (ALF), cross component linear model (CCLM), and bi-directional optical flow (BDOF) coding tools, whereas multi transform selection (MTS) and affine motion estimation are the primary candidates for complexity reduction. To the best of our knowledge, this is the first work to provide a comprehensive RDC analysis for practical VVC encoding. It can serve as a basis for practical VVC encoder implementation or optimization on various computing platforms.publishedVersionPeer reviewe
MPAI-EEV: Standardization Efforts of Artificial Intelligence based End-to-End Video Coding
The rapid advancement of artificial intelligence (AI) technology has led to
the prioritization of standardizing the processing, coding, and transmission of
video using neural networks. To address this priority area, the Moving Picture,
Audio, and Data Coding by Artificial Intelligence (MPAI) group is developing a
suite of standards called MPAI-EEV for "end-to-end optimized neural video
coding." The aim of this AI-based video standard project is to compress the
number of bits required to represent high-fidelity video data by utilizing
data-trained neural coding technologies. This approach is not constrained by
how data coding has traditionally been applied in the context of a hybrid
framework. This paper presents an overview of recent and ongoing
standardization efforts in this area and highlights the key technologies and
design philosophy of EEV. It also provides a comparison and report on some
primary efforts such as the coding efficiency of the reference model.
Additionally, it discusses emerging activities such as learned
Unmanned-Aerial-Vehicles (UAVs) video coding which are currently planned, under
development, or in the exploration phase. With a focus on UAV video signals,
this paper addresses the current status of these preliminary efforts. It also
indicates development timelines, summarizes the main technical details, and
provides pointers to further points of reference. The exploration experiment
shows that the EEV model performs better than the state-of-the-art video coding
standard H.266/VVC in terms of perceptual evaluation metric