Escaping the complexity-bitrate-quality barriers of video encoders via deep perceptual optimization

Abstract

We extend the concept of learnable video precoding (rate-aware neural-network processing prior to encoding) to deep perceptual optimization (DPO). Our framework comprises a pixel-to-pixel convolutional neural network that is trained based on the virtualization of core encoding blocks (block transform, quantization, block-based prediction) and multiple loss functions representing rate, distortion and visual quality of the virtual encoder. We evaluate our proposal with AVC/H.264 and AV1 under per-clip rate-quality optimization. The results show that DPO offers, on average, 14.2% bitrate reduction over AVC/H.264 and 12.5% bitrate reduction over AV1. Our framework is shown to improve both distortion- and perception-oriented metrics in a consistent manner, exhibiting only 3% outliers, which correspond to content with peculiar characteristics. Thus, DPO is shown to offer complexity-bitrate-quality tradeoffs that go beyond what conventional video encoders can offe

    Similar works