6,100 research outputs found
Overview of MV-HEVC prediction structures for light field video
Light field video is a promising technology for delivering the required six-degrees-of-freedom for natural content in virtual reality. Already existing multi-view coding (MVC) and multi-view plus depth (MVD) formats, such as MV-HEVC and 3D-HEVC, are the most conventional light field video coding solutions since they can compress video sequences captured simultaneously from multiple camera angles. 3D-HEVC treats a single view as a video sequence and the other sub-aperture views as gray-scale disparity (depth) maps. On the other hand, MV-HEVC treats each view as a separate video sequence, which allows the use of motion compensated algorithms similar to HEVC. While MV-HEVC and 3D-HEVC provide similar results, MV-HEVC does not require any disparity maps to be readily available, and it has a more straightforward implementation since it only uses syntax elements rather than additional prediction tools for inter-view prediction. However, there are many degrees of freedom in choosing an appropriate structure and it is currently still unknown which one is optimal for a given set of application requirements. In this work, various prediction structures for MV-HEVC are implemented and tested. The findings reveal the trade-off between compression gains, distortion and random access capabilities in MVHEVC light field video coding. The results give an overview of the most optimal solutions developed in the context of this work, and prediction structure algorithms proposed in state-of-the-art literature. This overview provides a useful benchmark for future development of light field video coding solutions
CNN-based Prediction of Partition Path for VVC Fast Inter Partitioning Using Motion Fields
The Versatile Video Coding (VVC) standard has been recently finalized by the
Joint Video Exploration Team (JVET). Compared to the High Efficiency Video
Coding (HEVC) standard, VVC offers about 50% compression efficiency gain, in
terms of Bjontegaard Delta-Rate (BD-rate), at the cost of a 10-fold increase in
encoding complexity. In this paper, we propose a method based on Convolutional
Neural Network (CNN) to speed up the inter partitioning process in VVC.
Firstly, a novel representation for the quadtree with nested multi-type tree
(QTMT) partition is introduced, derived from the partition path. Secondly, we
develop a U-Net-based CNN taking a multi-scale motion vector field as input at
the Coding Tree Unit (CTU) level. The purpose of CNN inference is to predict
the optimal partition path during the Rate-Distortion Optimization (RDO)
process. To achieve this, we divide CTU into grids and predict the Quaternary
Tree (QT) depth and Multi-type Tree (MT) split decisions for each cell of the
grid. Thirdly, an efficient partition pruning algorithm is introduced to employ
the CNN predictions at each partitioning level to skip RDO evaluations of
unnecessary partition paths. Finally, an adaptive threshold selection scheme is
designed, making the trade-off between complexity and efficiency scalable.
Experiments show that the proposed method can achieve acceleration ranging from
16.5% to 60.2% under the RandomAccess Group Of Picture 32 (RAGOP32)
configuration with a reasonable efficiency drop ranging from 0.44% to 4.59% in
terms of BD-rate, which surpasses other state-of-the-art solutions.
Additionally, our method stands out as one of the lightest approaches in the
field, which ensures its applicability to other encoders
NN-VVC: Versatile Video Coding boosted by self-supervisedly learned image coding for machines
The recent progress in artificial intelligence has led to an ever-increasing
usage of images and videos by machine analysis algorithms, mainly neural
networks. Nonetheless, compression, storage and transmission of media have
traditionally been designed considering human beings as the viewers of the
content. Recent research on image and video coding for machine analysis has
progressed mainly in two almost orthogonal directions. The first is represented
by end-to-end (E2E) learned codecs which, while offering high performance on
image coding, are not yet on par with state-of-the-art conventional video
codecs and lack interoperability. The second direction considers using the
Versatile Video Coding (VVC) standard or any other conventional video codec
(CVC) together with pre- and post-processing operations targeting machine
analysis. While the CVC-based methods benefit from interoperability and broad
hardware and software support, the machine task performance is often lower than
the desired level, particularly in low bitrates. This paper proposes a hybrid
codec for machines called NN-VVC, which combines the advantages of an
E2E-learned image codec and a CVC to achieve high performance in both image and
video coding for machines. Our experiments show that the proposed system
achieved up to -43.20% and -26.8% Bj{\o}ntegaard Delta rate reduction over VVC
for image and video data, respectively, when evaluated on multiple different
datasets and machine vision tasks. To the best of our knowledge, this is the
first research paper showing a hybrid video codec that outperforms VVC on
multiple datasets and multiple machine vision tasks.Comment: ISM 2023 Best paper award winner versio
- …