Popular Transformer networks have been successfully applied to remote sensing
(RS) image change detection (CD) identifications and achieve better results
than most convolutional neural networks (CNNs), but they still suffer from two
main problems. First, the computational complexity of the Transformer grows
quadratically with the increase of image spatial resolution, which is
unfavorable to very high-resolution (VHR) RS images. Second, these popular
Transformer networks tend to ignore the importance of fine-grained features,
which results in poor edge integrity and internal tightness for largely changed
objects and leads to the loss of small changed objects. To address the above
issues, this Letter proposes a Lightweight Structure-aware Transformer (LSAT)
network for RS image CD. The proposed LSAT has two advantages. First, a
Cross-dimension Interactive Self-attention (CISA) module with linear complexity
is designed to replace the vanilla self-attention in visual Transformer, which
effectively reduces the computational complexity while improving the feature
representation ability of the proposed LSAT. Second, a Structure-aware
Enhancement Module (SAEM) is designed to enhance difference features and edge
detail information, which can achieve double enhancement by difference
refinement and detail aggregation so as to obtain fine-grained features of
bi-temporal RS images. Experimental results show that the proposed LSAT
achieves significant improvement in detection accuracy and offers a better
tradeoff between accuracy and computational costs than most state-of-the-art CD
methods for VHR RS images