We extend the concept of learnable video precoding (rate-aware neural-network processing prior to encoding)
to deep perceptual optimization (DPO). Our framework comprises a pixel-to-pixel convolutional neural network
that is trained based on the virtualization of core encoding blocks (block transform, quantization, block-based
prediction) and multiple loss functions representing rate, distortion and visual quality of the virtual encoder.
We evaluate our proposal with AVC/H.264 and AV1 under per-clip rate-quality optimization. The results show
that DPO offers, on average, 14.2% bitrate reduction over AVC/H.264 and 12.5% bitrate reduction over AV1.
Our framework is shown to improve both distortion- and perception-oriented metrics in a consistent manner,
exhibiting only 3% outliers, which correspond to content with peculiar characteristics. Thus, DPO is shown to
offer complexity-bitrate-quality tradeoffs that go beyond what conventional video encoders can offe