Due to the powerful ability in capturing the global information, Transformer
has become an alternative architecture of CNNs for hyperspectral image
classification. However, general Transformer mainly considers the global
spectral information while ignores the multiscale spatial information of the
hyperspectral image. In this paper, we propose a multiscale spectral-spatial
convolutional Transformer (MultiscaleFormer) for hyperspectral image
classification. First, the developed method utilizes multiscale spatial patches
as tokens to formulate the spatial Transformer and generates multiscale spatial
representation of each band in each pixel. Second, the spatial representation
of all the bands in a given pixel are utilized as tokens to formulate the
spectral Transformer and generate the multiscale spectral-spatial
representation of each pixel. Besides, a modified spectral-spatial CAF module
is constructed in the MultiFormer to fuse cross-layer spectral and spatial
information. Therefore, the proposed MultiFormer can capture the multiscale
spectral-spatial information and provide better performance than most of other
architectures for hyperspectral image classification. Experiments are conducted
over commonly used real-world datasets and the comparison results show the
superiority of the proposed method.Comment: submitted to IEEE GRS