8 research outputs found
Zero-Shot Scene Classification for High Spatial Resolution Remote Sensing Images
Due to the rapid technological development of various sensors, a huge volume of high spatial resolution (HSR) image data can now be acquired. How to efficiently recognize the scenes from such HSR image data has become a critical task. Conventional approaches to remote sensing scene classification only utilize information from HSR images. Therefore, they always need a large amount of labeled data and cannot recognize the images from an unseen scene class without any visual sample in the labeled data. To overcome this drawback, we propose a novel approach for recognizing images from unseen scene classes, i.e., zero-shot scene classification (ZSSC). In this approach, we first use the well-known natural language process model, word2vec, to map names of seen/unseen scene classes to semantic vectors. A semantic-directed graph is then constructed over the semantic vectors for describing the relationships between unseen classes and seen classes. To transfer knowledge from the images in seen classes to those in unseen classes, we make an initial label prediction on test images by an unsupervised domain adaptation model. With the semantic-directed graph and initial prediction, a label-propagation algorithm is then developed for ZSSC. By leveraging the visual similarity among images from the same scene class, a label refinement approach based on sparse learning is used to suppress the noise in the zero-shot classification results. Experimental results show that the proposed approach significantly outperforms the state-of-the-art approaches in ZSSC.National Natural Science Foundation of China [61573363, 61573026]; 973 Program of China [2014CB340403, 2015CB352502]; Fundamental Research Funds for the Central Universities; Research Funds of Renmin University of China [15XNLQ01]; European Research Council FP7 Project SUNNY [313243]SCI(E)ARTICLE74157-41675
Self-supervised embedding for generalized zero-shot learning in remote sensing scene classification
Generalized zero-shot learning (GZSL) is the most popular approach for developing ZSL, which involves both seen and unseen classes in the classification process. Many of the existing GZSL approaches for scene classification in remote sensing images use word embeddings that do not effectively describe unseen categories. We explore word embedding to describe the classes of remote sensing scenes to improve the classification accuracy of unseen categories. The proposed method uses a data2vec embedding based on self-supervised learning to obtain a continuous and contextualized latent representation. This representation leverages two advantages of the standard transformer architecture. First, targets are not predefined as visual tokens. Second, latent representations preserve contextual information. We conducted experiments on three benchmark scene classification datasets of remote sensing images. The proposed approach demonstrates its efficacy over the existing GZSL approaches.publishedVersio
RSVQA: Visual Question Answering for Remote Sensing Data
This paper introduces the task of visual question answering for remote
sensing data (RSVQA). Remote sensing images contain a wealth of information
which can be useful for a wide range of tasks including land cover
classification, object counting or detection. However, most of the available
methodologies are task-specific, thus inhibiting generic and easy access to the
information contained in remote sensing data. As a consequence, accurate remote
sensing product generation still requires expert knowledge. With RSVQA, we
propose a system to extract information from remote sensing data that is
accessible to every user: we use questions formulated in natural language and
use them to interact with the images. With the system, images can be queried to
obtain high level information specific to the image content or relational
dependencies between objects visible in the images. Using an automatic method
introduced in this article, we built two datasets (using low and high
resolution data) of image/question/answer triplets. The information required to
build the questions and answers is queried from OpenStreetMap (OSM). The
datasets can be used to train (when using supervised methods) and evaluate
models to solve the RSVQA task. We report the results obtained by applying a
model based on Convolutional Neural Networks (CNNs) for the visual part and on
a Recurrent Neural Network (RNN) for the natural language part to this task.
The model is trained on the two datasets, yielding promising results in both
cases.Comment: 12 pages, Published in IEEE Transactions on Geoscience and Remote
Sensing. Added one experiment and authors' biographie
A Review of Landcover Classification with Very-High Resolution Remotely Sensed Optical Images—Analysis Unit, Model Scalability and Transferability
As an important application in remote sensing, landcover classification remains one of the most challenging tasks in very-high-resolution (VHR) image analysis. As the rapidly increasing number of Deep Learning (DL) based landcover methods and training strategies are claimed to be the state-of-the-art, the already fragmented technical landscape of landcover mapping methods has been further complicated. Although there exists a plethora of literature review work attempting to guide researchers in making an informed choice of landcover mapping methods, the articles either focus on the review of applications in a specific area or revolve around general deep learning models, which lack a systematic view of the ever advancing landcover mapping methods. In addition, issues related to training samples and model transferability have become more critical than ever in an era dominated by data-driven approaches, but these issues were addressed to a lesser extent in previous review articles regarding remote sensing classification. Therefore, in this paper, we present a systematic overview of existing methods by starting from learning methods and varying basic analysis units for landcover mapping tasks, to challenges and solutions on three aspects of scalability and transferability with a remote sensing classification focus including (1) sparsity and imbalance of data; (2) domain gaps across different geographical regions; and (3) multi-source and multi-view fusion. We discuss in detail each of these categorical methods and draw concluding remarks in these developments and recommend potential directions for the continued endeavor
A Review of Landcover Classification with Very-High Resolution Remotely Sensed Optical Images—Analysis Unit, Model Scalability and Transferability
As an important application in remote sensing, landcover classification remains one of the most challenging tasks in very-high-resolution (VHR) image analysis. As the rapidly increasing number of Deep Learning (DL) based landcover methods and training strategies are claimed to be the state-of-the-art, the already fragmented technical landscape of landcover mapping methods has been further complicated. Although there exists a plethora of literature review work attempting to guide researchers in making an informed choice of landcover mapping methods, the articles either focus on the review of applications in a specific area or revolve around general deep learning models, which lack a systematic view of the ever advancing landcover mapping methods. In addition, issues related to training samples and model transferability have become more critical than ever in an era dominated by data-driven approaches, but these issues were addressed to a lesser extent in previous review articles regarding remote sensing classification. Therefore, in this paper, we present a systematic overview of existing methods by starting from learning methods and varying basic analysis units for landcover mapping tasks, to challenges and solutions on three aspects of scalability and transferability with a remote sensing classification focus including (1) sparsity and imbalance of data; (2) domain gaps across different geographical regions; and (3) multi-source and multi-view fusion. We discuss in detail each of these categorical methods and draw concluding remarks in these developments and recommend potential directions for the continued endeavor