28,119 research outputs found
Exploring Transformer and Multilabel Classification for Remote Sensing Image Captioning
High-resolution remote sensing images are now available with the progress of remote sensing technology. With respect to popular remote sensing tasks, such as scene classification, image captioning provides comprehensible information about such images by summarizing the image content in human-readable text. Most existing remote sensing image captioning methods are based on deep learning-based encoder–decoder frameworks, using convolutional neural network or recurrent neural network as the backbone of such frameworks. Such frameworks show a limited capability to analyze sequential data and cope with the lack of captioned remote sensing training images. Recently introduced Transformer architecture exploits self-attention to obtain superior performance for sequence-analysis tasks. Inspired by this, in this work, we employ a Transformer as an encoder–decoder for remote sensing image captioning. Moreover, to deal with the limited training data, an auxiliary decoder is used that further helps the encoder in the training process. The auxiliary decoder is trained for multilabel scene classification due to its conceptual similarity to image captioning and capability of highlighting semantic classes. To the best of our knowledge, this is the first work exploiting multilabel classification to improve remote sensing image captioning. Experimental results on the University of California (UC)-Merced caption dataset show the efficacy of the proposed method. The implementation details can be found in https://gitlab.lrz.de/ai4eo/captioningMultilabel
Aggregated Deep Local Features for Remote Sensing Image Retrieval
Remote Sensing Image Retrieval remains a challenging topic due to the special
nature of Remote Sensing Imagery. Such images contain various different
semantic objects, which clearly complicates the retrieval task. In this paper,
we present an image retrieval pipeline that uses attentive, local convolutional
features and aggregates them using the Vector of Locally Aggregated Descriptors
(VLAD) to produce a global descriptor. We study various system parameters such
as the multiplicative and additive attention mechanisms and descriptor
dimensionality. We propose a query expansion method that requires no external
inputs. Experiments demonstrate that even without training, the local
convolutional features and global representation outperform other systems.
After system tuning, we can achieve state-of-the-art or competitive results.
Furthermore, we observe that our query expansion method increases overall
system performance by about 3%, using only the top-three retrieved images.
Finally, we show how dimensionality reduction produces compact descriptors with
increased retrieval performance and fast retrieval computation times, e.g. 50%
faster than the current systems.Comment: Published in Remote Sensing. The first two authors have equal
contributio
Bidirectional-Convolutional LSTM Based Spectral-Spatial Feature Learning for Hyperspectral Image Classification
This paper proposes a novel deep learning framework named
bidirectional-convolutional long short term memory (Bi-CLSTM) network to
automatically learn the spectral-spatial feature from hyperspectral images
(HSIs). In the network, the issue of spectral feature extraction is considered
as a sequence learning problem, and a recurrent connection operator across the
spectral domain is used to address it. Meanwhile, inspired from the widely used
convolutional neural network (CNN), a convolution operator across the spatial
domain is incorporated into the network to extract the spatial feature.
Besides, to sufficiently capture the spectral information, a bidirectional
recurrent connection is proposed. In the classification phase, the learned
features are concatenated into a vector and fed to a softmax classifier via a
fully-connected operator. To validate the effectiveness of the proposed
Bi-CLSTM framework, we compare it with several state-of-the-art methods,
including the CNN framework, on three widely used HSIs. The obtained results
show that Bi-CLSTM can improve the classification performance as compared to
other methods
- …