6 research outputs found
X-ModalNet: A Semi-Supervised Deep Cross-Modal Network for Classification of Remote Sensing Data
This paper addresses the problem of semi-supervised transfer learning with
limited cross-modality data in remote sensing. A large amount of multi-modal
earth observation images, such as multispectral imagery (MSI) or synthetic
aperture radar (SAR) data, are openly available on a global scale, enabling
parsing global urban scenes through remote sensing imagery. However, their
ability in identifying materials (pixel-wise classification) remains limited,
due to the noisy collection environment and poor discriminative information as
well as limited number of well-annotated training images. To this end, we
propose a novel cross-modal deep-learning framework, called X-ModalNet, with
three well-designed modules: self-adversarial module, interactive learning
module, and label propagation module, by learning to transfer more
discriminative information from a small-scale hyperspectral image (HSI) into
the classification task using a large-scale MSI or SAR data. Significantly,
X-ModalNet generalizes well, owing to propagating labels on an updatable graph
constructed by high-level features on the top of the network, yielding
semi-supervised cross-modality learning. We evaluate X-ModalNet on two
multi-modal remote sensing datasets (HSI-MSI and HSI-SAR) and achieve a
significant improvement in comparison with several state-of-the-art methods
Learning transformer-based heterogeneously salient graph representation for multimodal fusion classification of hyperspectral image and LiDAR data
Data collected by different modalities can provide a wealth of complementary
information, such as hyperspectral image (HSI) to offer rich spectral-spatial
properties, synthetic aperture radar (SAR) to provide structural information
about the Earth's surface, and light detection and ranging (LiDAR) to cover
altitude information about ground elevation. Therefore, a natural idea is to
combine multimodal images for refined and accurate land-cover interpretation.
Although many efforts have been attempted to achieve multi-source remote
sensing image classification, there are still three issues as follows: 1)
indiscriminate feature representation without sufficiently considering modal
heterogeneity, 2) abundant features and complex computations associated with
modeling long-range dependencies, and 3) overfitting phenomenon caused by
sparsely labeled samples. To overcome the above barriers, a transformer-based
heterogeneously salient graph representation (THSGR) approach is proposed in
this paper. First, a multimodal heterogeneous graph encoder is presented to
encode distinctively non-Euclidean structural features from heterogeneous data.
Then, a self-attention-free multi-convolutional modulator is designed for
effective and efficient long-term dependency modeling. Finally, a mean forward
is put forward in order to avoid overfitting. Based on the above structures,
the proposed model is able to break through modal gaps to obtain differentiated
graph representation with competitive time cost, even for a small fraction of
training samples. Experiments and analyses on three benchmark datasets with
various state-of-the-art (SOTA) methods show the performance of the proposed
approach
Hyperspectral Point Cloud Projection for the Semantic Segmentation of Multimodal Hyperspectral and Lidar Data with Point Convolution-Based Deep Fusion Neural Networks
The fusion of dissimilar data modalities in neural networks presents a significant challenge, particularly in the case of multimodal hyperspectral and lidar data. Hyperspectral data, typically represented as images with potentially hundreds of bands, provide a wealth of spectral information, while lidar data, commonly represented as point clouds with millions of unordered points in 3D space, offer structural information. The complementary nature of these data types presents a unique challenge due to their fundamentally different representations requiring distinct processing methods. In this work, we introduce an alternative hyperspectral data representation in the form of a hyperspectral point cloud (HSPC), which enables ingestion and exploitation with point cloud processing neural network methods. Additionally, we present a composite fusion-style, point convolution-based neural network architecture for the semantic segmentation of HSPC and lidar point cloud data. We investigate the effects of the proposed HSPC representation for both unimodal and multimodal networks ingesting a variety of hyperspectral and lidar data representations. Finally, we compare the performance of these networks against each other and previous approaches. This study paves the way for innovative approaches to multimodal remote sensing data fusion, unlocking new possibilities for enhanced data analysis and interpretation
Coupled Convolutional Neural Network with Adaptive Response Function Learning for Unsupervised Hyperspectral Super-Resolution
Due to the limitations of hyperspectral imaging systems, hyperspectral
imagery (HSI) often suffers from poor spatial resolution, thus hampering many
applications of the imagery. Hyperspectral super-resolution refers to fusing
HSI and MSI to generate an image with both high spatial and high spectral
resolutions. Recently, several new methods have been proposed to solve this
fusion problem, and most of these methods assume that the prior information of
the Point Spread Function (PSF) and Spectral Response Function (SRF) are known.
However, in practice, this information is often limited or unavailable. In this
work, an unsupervised deep learning-based fusion method - HyCoNet - that can
solve the problems in HSI-MSI fusion without the prior PSF and SRF information
is proposed. HyCoNet consists of three coupled autoencoder nets in which the
HSI and MSI are unmixed into endmembers and abundances based on the linear
unmixing model. Two special convolutional layers are designed to act as a
bridge that coordinates with the three autoencoder nets, and the PSF and SRF
parameters are learned adaptively in the two convolution layers during the
training process. Furthermore, driven by the joint loss function, the proposed
method is straightforward and easily implemented in an end-to-end training
manner. The experiments performed in the study demonstrate that the proposed
method performs well and produces robust results for different datasets and
arbitrary PSFs and SRFs
Learning Shared Cross-modality Representation Using Multispectral-LiDAR and Hyperspectral Data
Due to the ever-growing diversity of the data source, multi-modality feature learning has attracted more and more attention. However, most of these methods are designed by jointly learning feature representation from multi-modalities that exist in both training and test sets, yet they are less investigated in absence of certain modality in the test phase. To this end, in this letter, we propose to learn a shared feature space across multi-modalities in the training process. By this way, the out-of-sample from any of multi-modalities can be directly projected onto the learned space for a more effective cross-modality representation. More significantly, the shared space is regarded as a latent subspace in our proposed method, which connects the original multi-modal samples with label information to further improve the feature discrimination. Experiments are conducted on the multispectral-Lidar and hyperspectral dataset provided by the 2018 IEEE GRSS Data Fusion Contest to demonstrate the effectiveness and superiority of the proposed method in comparison with several popular baselines