Image fusion for the novelty rotating synthetic aperture system based on vision transformer

Abstract

Rotating synthetic aperture (RSA) technology offers a promising solution for achieving large-aperture and lightweight designs in optical remote-sensing systems. It employs a rectangular primary mirror, resulting in noncircular spatial symmetry in the point-spread function, which changes over time as the mirror rotates. Consequently, it is crucial to employ an appropriate image-fusion method to merge high-resolution information intermittently captured from different directions in the image sequence owing to the rotation of the mirror. However, existing image-fusion methods have struggled to address the unique imaging mechanism of this system and the characteristics of the geostationary orbit in which the system operates. To address this challenge, we model the imaging process of a noncircular rotating pupil and analyse its on-orbit imaging characteristics. Based on this analysis, we propose an image-fusion network based on a vision transformer. This network incorporates inter-frame mutual attention and intra-frame self-attention mechanisms, facilitating more effective extraction of temporal and spatial information from the image sequence. Specifically, mutual attention was used to model the correlation between pixels that were close to each other in the spatial and temporal dimensions, whereas long-range spatial dependencies were captured using intra-frame self-attention in the rotated variable-size attention block. We subsequently enhanced the fusion of spatiotemporal information using video swin transformer blocks. Extensive digital simulations and semi-physical imaging experiments on remote-sensing images obtained from the WorldView-3 satellite demonstrated that our method outperformed both image-fusion methods designed for the RSA system and state-of-the-art general deep learning-based methods

    Similar works