Multi-view image compression plays a critical role in 3D-related
applications. Existing methods adopt a predictive coding architecture, which
requires joint encoding to compress the corresponding disparity as well as
residual information. This demands collaboration among cameras and enforces the
epipolar geometric constraint between different views, which makes it
challenging to deploy these methods in distributed camera systems with randomly
overlapping fields of view. Meanwhile, distributed source coding theory
indicates that efficient data compression of correlated sources can be achieved
by independent encoding and joint decoding, which motivates us to design a
learning-based distributed multi-view image coding (LDMIC) framework. With
independent encoders, LDMIC introduces a simple yet effective joint context
transfer module based on the cross-attention mechanism at the decoder to
effectively capture the global inter-view correlations, which is insensitive to
the geometric relationships between images. Experimental results show that
LDMIC significantly outperforms both traditional and learning-based MIC methods
while enjoying fast encoding speed. Code will be released at
https://github.com/Xinjie-Q/LDMIC.Comment: Accepted by ICLR 202