High-dimensional images, known for their rich semantic information, are
widely applied in remote sensing and other fields. The spatial information in
these images reflects the object's texture features, while the spectral
information reveals the potential spectral representations across different
bands. Currently, the understanding of high-dimensional images remains limited
to a single-domain perspective with performance degradation. Motivated by the
masking texture effect observed in the human visual system, we present a
multi-domain diffusion-driven feature learning network (MDFL) , a scheme to
redefine the effective information domain that the model really focuses on.
This method employs diffusion-based posterior sampling to explicitly consider
joint information interactions between the high-dimensional manifold structures
in the spectral, spatial, and frequency domains, thereby eliminating the
influence of masking texture effects in visual models. Additionally, we
introduce a feature reuse mechanism to gather deep and raw features of
high-dimensional data. We demonstrate that MDFL significantly improves the
feature extraction performance of high-dimensional data, thereby providing a
powerful aid for revealing the intrinsic patterns and structures of such data.
The experimental results on three multi-modal remote sensing datasets show that
MDFL reaches an average overall accuracy of 98.25%, outperforming various
state-of-the-art baseline schemes. The code will be released, contributing to
the computer vision community