14 research outputs found
Joint & Progressive Learning from High-Dimensional Data for Multi-Label Classification
Despite the fact that nonlinear subspace learning techniques (e.g. manifold
learning) have successfully applied to data representation, there is still room
for improvement in explainability (explicit mapping), generalization
(out-of-samples), and cost-effectiveness (linearization). To this end, a novel
linearized subspace learning technique is developed in a joint and progressive
way, called \textbf{j}oint and \textbf{p}rogressive \textbf{l}earning
str\textbf{a}teg\textbf{y} (J-Play), with its application to multi-label
classification. The J-Play learns high-level and semantically meaningful
feature representation from high-dimensional data by 1) jointly performing
multiple subspace learning and classification to find a latent subspace where
samples are expected to be better classified; 2) progressively learning
multi-coupled projections to linearly approach the optimal mapping bridging the
original space with the most discriminative subspace; 3) locally embedding
manifold structure in each learnable latent subspace. Extensive experiments are
performed to demonstrate the superiority and effectiveness of the proposed
method in comparison with previous state-of-the-art methods.Comment: accepted in ECCV 201
Spatial-Spectral Manifold Embedding of Hyperspectral Data
In recent years, hyperspectral imaging, also known as imaging spectroscopy,
has been paid an increasing interest in geoscience and remote sensing
community. Hyperspectral imagery is characterized by very rich spectral
information, which enables us to recognize the materials of interest lying on
the surface of the Earth more easier. We have to admit, however, that high
spectral dimension inevitably brings some drawbacks, such as expensive data
storage and transmission, information redundancy, etc. Therefore, to reduce the
spectral dimensionality effectively and learn more discriminative spectral
low-dimensional embedding, in this paper we propose a novel hyperspectral
embedding approach by simultaneously considering spatial and spectral
information, called spatial-spectral manifold embedding (SSME). Beyond the
pixel-wise spectral embedding approaches, SSME models the spatial and spectral
information jointly in a patch-based fashion. SSME not only learns the spectral
embedding by using the adjacency matrix obtained by similarity measurement
between spectral signatures, but also models the spatial neighbours of a target
pixel in hyperspectral scene by sharing the same weights (or edges) in the
process of learning embedding. Classification is explored as a potential
strategy to quantitatively evaluate the performance of learned embedding
representations. Classification is explored as a potential application for
quantitatively evaluating the performance of these hyperspectral embedding
algorithms. Extensive experiments conducted on the widely-used hyperspectral
datasets demonstrate the superiority and effectiveness of the proposed SSME as
compared to several state-of-the-art embedding methods
Spatial-Spectral Manifold Embedding of Hyperspectral Data
In recent years, hyperspectral imaging, also known as imaging spectroscopy, has been paid an increasing interest in geoscience and
remote sensing community. Hyperspectral imagery is characterized by very rich spectral information, which enables us to recognize the materials of interest lying on the surface of the Earth more easier. We have to admit, however, that high spectral dimension inevitably brings some drawbacks, such as expensive data storage and transmission, information redundancy, etc. Therefore, to reduce the spectral dimensionality effectively and learn more discriminative spectral low-dimensional embedding, in this paper we propose a novel hyperspectral embedding approach by simultaneously considering spatial and spectral information, called spatialspectral manifold embedding (SSME). Beyond the pixel-wise spectral embedding approaches, SSME models the spatial and spectral information jointly in a patch-based fashion. SSME not only learns the spectral embedding by using the adjacency matrix obtained by similarity measurement between spectral signatures, but also models the spatial neighbours of a target pixel in hyperspectral scene by sharing the same weights (or edges) in the process of learning embedding. Classification is explored as a potential strategy
to quantitatively evaluate the performance of learned embedding representations. Classification is explored as a potential application for quantitatively evaluating the performance of these hyperspectral embedding algorithms. Extensive experiments conducted on the widely-used hyperspectral datasets demonstrate the superiority and effectiveness of the proposed SSME as compared to several state-of-the-art embedding methods
X-ModalNet: A Semi-Supervised Deep Cross-Modal Network for Classification of Remote Sensing Data
This paper addresses the problem of semi-supervised transfer learning with
limited cross-modality data in remote sensing. A large amount of multi-modal
earth observation images, such as multispectral imagery (MSI) or synthetic
aperture radar (SAR) data, are openly available on a global scale, enabling
parsing global urban scenes through remote sensing imagery. However, their
ability in identifying materials (pixel-wise classification) remains limited,
due to the noisy collection environment and poor discriminative information as
well as limited number of well-annotated training images. To this end, we
propose a novel cross-modal deep-learning framework, called X-ModalNet, with
three well-designed modules: self-adversarial module, interactive learning
module, and label propagation module, by learning to transfer more
discriminative information from a small-scale hyperspectral image (HSI) into
the classification task using a large-scale MSI or SAR data. Significantly,
X-ModalNet generalizes well, owing to propagating labels on an updatable graph
constructed by high-level features on the top of the network, yielding
semi-supervised cross-modality learning. We evaluate X-ModalNet on two
multi-modal remote sensing datasets (HSI-MSI and HSI-SAR) and achieve a
significant improvement in comparison with several state-of-the-art methods
Spectral Superresolution of Multispectral Imagery with Joint Sparse and Low-Rank Learning
Extensive attention has been widely paid to enhance the spatial resolution of
hyperspectral (HS) images with the aid of multispectral (MS) images in remote
sensing. However, the ability in the fusion of HS and MS images remains to be
improved, particularly in large-scale scenes, due to the limited acquisition of
HS images. Alternatively, we super-resolve MS images in the spectral domain by
the means of partially overlapped HS images, yielding a novel and promising
topic: spectral superresolution (SSR) of MS imagery. This is challenging and
less investigated task due to its high ill-posedness in inverse imaging. To
this end, we develop a simple but effective method, called joint sparse and
low-rank learning (J-SLoL), to spectrally enhance MS images by jointly learning
low-rank HS-MS dictionary pairs from overlapped regions. J-SLoL infers and
recovers the unknown hyperspectral signals over a larger coverage by sparse
coding on the learned dictionary pair. Furthermore, we validate the SSR
performance on three HS-MS datasets (two for classification and one for
unmixing) in terms of reconstruction, classification, and unmixing by comparing
with several existing state-of-the-art baselines, showing the effectiveness and
superiority of the proposed J-SLoL algorithm. Furthermore, the codes and
datasets will be available at:
https://github.com/danfenghong/IEEE\_TGRS\_J-SLoL, contributing to the RS
community
More Diverse Means Better: Multimodal Deep Learning Meets Remote Sensing Imagery Classification
Classification and identification of the materials lying over or beneath the
Earth's surface have long been a fundamental but challenging research topic in
geoscience and remote sensing (RS) and have garnered a growing concern owing to
the recent advancements of deep learning techniques. Although deep networks
have been successfully applied in single-modality-dominated classification
tasks, yet their performance inevitably meets the bottleneck in complex scenes
that need to be finely classified, due to the limitation of information
diversity. In this work, we provide a baseline solution to the aforementioned
difficulty by developing a general multimodal deep learning (MDL) framework. In
particular, we also investigate a special case of multi-modality learning (MML)
-- cross-modality learning (CML) that exists widely in RS image classification
applications. By focusing on "what", "where", and "how" to fuse, we show
different fusion strategies as well as how to train deep networks and build the
network architecture. Specifically, five fusion architectures are introduced
and developed, further being unified in our MDL framework. More significantly,
our framework is not only limited to pixel-wise classification tasks but also
applicable to spatial information modeling with convolutional neural networks
(CNNs). To validate the effectiveness and superiority of the MDL framework,
extensive experiments related to the settings of MML and CML are conducted on
two different multimodal RS datasets. Furthermore, the codes and datasets will
be available at https://github.com/danfenghong/IEEE_TGRS_MDL-RS, contributing
to the RS community
Coupled Convolutional Neural Network with Adaptive Response Function Learning for Unsupervised Hyperspectral Super-Resolution
Due to the limitations of hyperspectral imaging systems, hyperspectral
imagery (HSI) often suffers from poor spatial resolution, thus hampering many
applications of the imagery. Hyperspectral super-resolution refers to fusing
HSI and MSI to generate an image with both high spatial and high spectral
resolutions. Recently, several new methods have been proposed to solve this
fusion problem, and most of these methods assume that the prior information of
the Point Spread Function (PSF) and Spectral Response Function (SRF) are known.
However, in practice, this information is often limited or unavailable. In this
work, an unsupervised deep learning-based fusion method - HyCoNet - that can
solve the problems in HSI-MSI fusion without the prior PSF and SRF information
is proposed. HyCoNet consists of three coupled autoencoder nets in which the
HSI and MSI are unmixed into endmembers and abundances based on the linear
unmixing model. Two special convolutional layers are designed to act as a
bridge that coordinates with the three autoencoder nets, and the PSF and SRF
parameters are learned adaptively in the two convolution layers during the
training process. Furthermore, driven by the joint loss function, the proposed
method is straightforward and easily implemented in an end-to-end training
manner. The experiments performed in the study demonstrate that the proposed
method performs well and produces robust results for different datasets and
arbitrary PSFs and SRFs
Tensor singular spectral analysis for 3D feature extraction in hyperspectral images.
Due to the cubic structure of a hyperspectral image (HSI), how to characterize its spectral and spatial properties in three dimensions is challenging. Conventional spectral-spatial methods usually extract spectral and spatial information separately, ignoring their intrinsic correlations. Recently, some 3D feature extraction methods are developed for the extraction of spectral and spatial features simultaneously, although they rely on local spatial-spectral regions and thus ignore the global spectral similarity and spatial consistency. Meanwhile, some of these methods contain huge model parameters which require a large number of training samples. In this paper, a novel Tensor Singular Spectral Analysis (TensorSSA) method is proposed to extract global and low-rank features of HSI. In TensorSSA, an adaptive embedding operation is first proposed to construct a trajectory tensor corresponding to the entire HSI, which takes full advantage of the spatial similarity and improves the adequate representation of the global low-rank properties of the HSI. Moreover, the obtained trajectory tensor, which contains the global and local spatial and spectral information of the HSI, is decomposed by the Tensor singular value decomposition (t-SVD) to explore its low-rank intrinsic features. Finally, the efficacy of the extracted features is evaluated using the accuracy of image classification with a support vector machine (SVM) classifier. Experimental results on three publicly available datasets have fully demonstrated the superiority of the proposed TensorSSA over a few state-of-the-art 2D/3D feature extraction and deep learning algorithms, even with a limited number of training samples
Graph Relation Network: Modeling Relations between Scenes for Multi-Label Remote Sensing Image Classification and Retrieval
Due to the proliferation of large-scale remote-sensing (RS) archives with multiple annotations, multilabel RS scene classification and retrieval are becoming increasingly popular. Although some recent deep learning-based methods are able to achieve promising results in this context, the lack of research on how to learn embedding spaces under the multilabel assumption often makes these models unable to preserve complex semantic relations pervading aerial scenes, which is an important limitation in RS applications. To fill this gap, we propose a new graph relation network (GRN) for multilabel RS scene categorization. Our GRN is able to model the relations between samples (or scenes) by making use of a graph structure which is fed into network learning. For this purpose, we define a new loss function called scalable neighbor discriminative loss with binary cross entropy (SNDL-BCE) that is able to embed the graph structures through the networks more effectively. The proposed approach can guide deep learning techniques (such as convolutional neural networks) to a more discriminative metric space, where semantically similar RS scenes are closely embedded and dissimilar images are separated from a novel multilabel viewpoint. To achieve this goal, our GRN jointly maximizes a weighted leave-one-out K -nearest neighbors ( K NN) score in the training set, where the weight matrix describes the contributions of the nearest neighbors associated with each RS image on its class decision, and the likelihood of the class discrimination in the multilabel scenario. An extensive experimental comparison, conducted on three multilabel RS scene data archives, validates the effectiveness of the proposed GRN in terms of K NN classification and image retrieval. The codes of this article will be made publicly available for reproducible research in the community