Learned B-frame video compression aims to adopt bi-directional motion
estimation and motion compensation (MEMC) coding for middle frame
reconstruction. However, previous learned approaches often directly extend
neural P-frame codecs to B-frame relying on bi-directional optical-flow
estimation or video frame interpolation. They suffer from inaccurate quantized
motions and inefficient motion compensation. To address these issues, we
propose a simple yet effective structure called Interpolation-driven B-frame
Video Compression (IBVC). Our approach only involves two major operations:
video frame interpolation and artifact reduction compression. IBVC introduces a
bit-rate free MEMC based on interpolation, which avoids optical-flow
quantization and additional compression distortions. Later, to reduce duplicate
bit-rate consumption and focus on unaligned artifacts, a residual guided
masking encoder is deployed to adaptively select the meaningful contexts with
interpolated multi-scale dependencies. In addition, a conditional
spatio-temporal decoder is proposed to eliminate location errors and artifacts
instead of using MEMC coding in other methods. The experimental results on
B-frame coding demonstrate that IBVC has significant improvements compared to
the relevant state-of-the-art methods. Meanwhile, our approach can save bit
rates compared with the random access (RA) configuration of H.266 (VTM). The
code will be available at https://github.com/ruhig6/IBVC.Comment: Submitted to IEEE TCSV