Motion compensation is a fundamental technology in video coding to remove the
temporal redundancy between video frames. To further improve the coding
efficiency, sub-pel motion compensation has been utilized, which requires
interpolation of fractional samples. The video coding standards usually adopt
fixed interpolation filters that are derived from the signal processing theory.
However, as video signal is not stationary, the fixed interpolation filters may
turn out less efficient. Inspired by the great success of convolutional neural
network (CNN) in computer vision, we propose to design a CNN-based
interpolation filter (CNNIF) for video coding. Different from previous studies,
one difficulty for training CNNIF is the lack of ground-truth since the
fractional samples are actually not available. Our solution for this problem is
to derive the "ground-truth" of fractional samples by smoothing high-resolution
images, which is verified to be effective by the conducted experiments.
Compared to the fixed half-pel interpolation filter for luma in High Efficiency
Video Coding (HEVC), our proposed CNNIF achieves up to 3.2% and on average 0.9%
BD-rate reduction under low-delay P configuration.Comment: International Symposium on Circuits and Systems (ISCAS) 201