This paper proposes a novel Non-Local Attention optmization and Improved
Context modeling-based image compression (NLAIC) algorithm, which is built on
top of the deep nerual network (DNN)-based variational auto-encoder (VAE)
structure. Our NLAIC 1) embeds non-local network operations as non-linear
transforms in the encoders and decoders for both the image and the latent
representation probability information (known as hyperprior) to capture both
local and global correlations, 2) applies attention mechanism to generate masks
that are used to weigh the features, which implicitly adapt bit allocation for
feature elements based on their importance, and 3) implements the improved
conditional entropy modeling of latent features using joint 3D convolutional
neural network (CNN)-based autoregressive contexts and hyperpriors. Towards the
practical application, additional enhancements are also introduced to speed up
processing (e.g., parallel 3D CNN-based context prediction), reduce memory
consumption (e.g., sparse non-local processing) and alleviate the
implementation complexity (e.g., unified model for variable rates without
re-training). The proposed model outperforms existing methods on Kodak and CLIC
datasets with the state-of-the-art compression efficiency reported, including
learned and conventional (e.g., BPG, JPEG2000, JPEG) image compression methods,
for both PSNR and MS-SSIM distortion metrics.Comment: arXiv admin note: substantial text overlap with arXiv:1904.0975