The design of a neural image compression network is governed by how well the
entropy model matches the true distribution of the latent code. Apart from the
model capacity, this ability is indirectly under the effect of how close the
relaxed quantization is to the actual hard quantization. Optimizing the
parameters of a rate-distortion variational autoencoder (R-D VAE) is ruled by
this approximated quantization scheme. In this paper, we propose a
feature-level frequency disentanglement to help the relaxed scalar quantization
achieve lower bit rates by guiding the high entropy latent features to include
most of the low-frequency texture of the image. In addition, to strengthen the
de-correlating power of the transformer-based analysis/synthesis transform, an
augmented self-attention score calculation based on the Hadamard product is
utilized during both encoding and decoding. Channel-wise autoregressive entropy
modeling takes advantage of the proposed frequency separation as it inherently
directs high-informational low-frequency channels to the first chunks and
conditions the future chunks on it. The proposed network not only outperforms
hand-engineered codecs, but also neural network-based codecs built on
computation-heavy spatially autoregressive entropy models.Comment: Accepted to 30th IEEE International Conference on Image
Processing (ICIP 2023