7,807 research outputs found
Exploring Long- and Short-Range Temporal Information for Learned Video Compression
Learned video compression methods have gained a variety of interest in the
video coding community since they have matched or even exceeded the
rate-distortion (RD) performance of traditional video codecs. However, many
current learning-based methods are dedicated to utilizing short-range temporal
information, thus limiting their performance. In this paper, we focus on
exploiting the unique characteristics of video content and further exploring
temporal information to enhance compression performance. Specifically, for
long-range temporal information exploitation, we propose temporal prior that
can update continuously within the group of pictures (GOP) during inference. In
that case temporal prior contains valuable temporal information of all decoded
images within the current GOP. As for short-range temporal information, we
propose a progressive guided motion compensation to achieve robust and
effective compensation. In detail, we design a hierarchical structure to
achieve multi-scale compensation. More importantly, we use optical flow
guidance to generate pixel offsets between feature maps at each scale, and the
compensation results at each scale will be used to guide the following scale's
compensation. Sufficient experimental results demonstrate that our method can
obtain better RD performance than state-of-the-art video compression
approaches. The code is publicly available on:
https://github.com/Huairui/LSTVC.Comment: arXiv admin note: text overlap with arXiv:2207.0458
Learned Video Compression via Heterogeneous Deformable Compensation Network
Learned video compression has recently emerged as an essential research topic
in developing advanced video compression technologies, where motion
compensation is considered one of the most challenging issues. In this paper,
we propose a learned video compression framework via heterogeneous deformable
compensation strategy (HDCVC) to tackle the problems of unstable compression
performance caused by single-size deformable kernels in downsampled feature
domain. More specifically, instead of utilizing optical flow warping or
single-size-kernel deformable alignment, the proposed algorithm extracts
features from the two adjacent frames to estimate content-adaptive
heterogeneous deformable (HetDeform) kernel offsets. Then we transform the
reference features with the HetDeform convolution to accomplish motion
compensation. Moreover, we design a Spatial-Neighborhood-Conditioned Divisive
Normalization (SNCDN) to achieve more effective data Gaussianization combined
with the Generalized Divisive Normalization. Furthermore, we propose a
multi-frame enhanced reconstruction module for exploiting context and temporal
information for final quality enhancement. Experimental results indicate that
HDCVC achieves superior performance than the recent state-of-the-art learned
video compression approaches
- …