The Versatile Video Coding (VVC) standard has been recently finalized by the
Joint Video Exploration Team (JVET). Compared to the High Efficiency Video
Coding (HEVC) standard, VVC offers about 50% compression efficiency gain, in
terms of Bjontegaard Delta-Rate (BD-rate), at the cost of a 10-fold increase in
encoding complexity. In this paper, we propose a method based on Convolutional
Neural Network (CNN) to speed up the inter partitioning process in VVC.
Firstly, a novel representation for the quadtree with nested multi-type tree
(QTMT) partition is introduced, derived from the partition path. Secondly, we
develop a U-Net-based CNN taking a multi-scale motion vector field as input at
the Coding Tree Unit (CTU) level. The purpose of CNN inference is to predict
the optimal partition path during the Rate-Distortion Optimization (RDO)
process. To achieve this, we divide CTU into grids and predict the Quaternary
Tree (QT) depth and Multi-type Tree (MT) split decisions for each cell of the
grid. Thirdly, an efficient partition pruning algorithm is introduced to employ
the CNN predictions at each partitioning level to skip RDO evaluations of
unnecessary partition paths. Finally, an adaptive threshold selection scheme is
designed, making the trade-off between complexity and efficiency scalable.
Experiments show that the proposed method can achieve acceleration ranging from
16.5% to 60.2% under the RandomAccess Group Of Picture 32 (RAGOP32)
configuration with a reasonable efficiency drop ranging from 0.44% to 4.59% in
terms of BD-rate, which surpasses other state-of-the-art solutions.
Additionally, our method stands out as one of the lightest approaches in the
field, which ensures its applicability to other encoders