4 research outputs found
Learning Frequency-Specific Quantization Scaling in VVC for Standard-Compliant Task-driven Image Coding
Today, visual data is often analyzed by a neural network without any human
being involved, which demands for specialized codecs. For standard-compliant
codec adaptations towards certain information sinks, HEVC or VVC provide the
possibility of frequency-specific quantization with scaling lists. This is a
well-known method for the human visual system, where scaling lists are derived
from psycho-visual models. In this work, we employ scaling lists when
performing VVC intra coding for neural networks as information sink. To this
end, we propose a novel data-driven method to obtain optimal scaling lists for
arbitrary neural networks. Experiments with Mask R-CNN as information sink
reveal that coding the Cityscapes dataset with the proposed scaling lists
result in peak bitrate savings of 8.9 % over VVC with constant quantization. By
that, our approach also outperforms scaling lists optimized for the human
visual system. The generated scaling lists can be found under
https://github.com/FAU-LMS/VCM_scaling_lists.Comment: Originally submitted at IEEE ICIP 202
Increasing Video Perceptual Quality with GANs and Semantic Coding
We have seen a rise in video based user communication in the last year, unfortunately fueled by the spread of COVID-19 disease. Efficient low-latency delay of transmission of video is a challenging problem which must also deal with the segmented nature of network infrastructure not always allowing a high throughput. Lossy video compression is a basic requirement to enable such technology widely. While this may compromise the quality of the streamed video there are recent deep learning based solutions to restore quality of a lossy compressed video. Considering the very nature of video conferencing, bitrate allocation in video streaming could be driven semantically, differentiating quality between the talking subjects and the background. Currently there have not been any work studying the restoration of semantically coded video using deep learning. In this work we show how such videos can be efficiently generated by shifting bitrate with masks derived via computer vision and how a deep generative adversarial network can be trained to restore video quality. Our study shows that the combination of semantic coding and learning based video restoration can provide superior results