The lack of ability to adapt the motion compensation model to video content
is an important limitation of current end-to-end learned video compression
models. This paper advances the state-of-the-art by proposing an adaptive
motion-compensation model for end-to-end rate-distortion optimized hierarchical
bi-directional video compression. In particular, we propose two novelties: i) a
multi-scale deformable alignment scheme at the feature level combined with
multi-scale conditional coding, ii) motion-content adaptive inference. In
addition, we employ a gain unit, which enables a single model to operate at
multiple rate-distortion operating points. We also exploit the gain unit to
control bit allocation among intra-coded vs. bi-directionally coded frames by
fine tuning corresponding models for truly flexible-rate learned video coding.
Experimental results demonstrate state-of-the-art rate-distortion performance
exceeding those of all prior art in learned video coding.Comment: Accepted for publication in IEEE International Conference on Image
Processing (ICIP) 202