5,995 research outputs found
Zero-Shot Scene Classification for High Spatial Resolution Remote Sensing Images
Due to the rapid technological development of various sensors, a huge volume of high spatial resolution (HSR) image data can now be acquired. How to efficiently recognize the scenes from such HSR image data has become a critical task. Conventional approaches to remote sensing scene classification only utilize information from HSR images. Therefore, they always need a large amount of labeled data and cannot recognize the images from an unseen scene class without any visual sample in the labeled data. To overcome this drawback, we propose a novel approach for recognizing images from unseen scene classes, i.e., zero-shot scene classification (ZSSC). In this approach, we first use the well-known natural language process model, word2vec, to map names of seen/unseen scene classes to semantic vectors. A semantic-directed graph is then constructed over the semantic vectors for describing the relationships between unseen classes and seen classes. To transfer knowledge from the images in seen classes to those in unseen classes, we make an initial label prediction on test images by an unsupervised domain adaptation model. With the semantic-directed graph and initial prediction, a label-propagation algorithm is then developed for ZSSC. By leveraging the visual similarity among images from the same scene class, a label refinement approach based on sparse learning is used to suppress the noise in the zero-shot classification results. Experimental results show that the proposed approach significantly outperforms the state-of-the-art approaches in ZSSC.National Natural Science Foundation of China [61573363, 61573026]; 973 Program of China [2014CB340403, 2015CB352502]; Fundamental Research Funds for the Central Universities; Research Funds of Renmin University of China [15XNLQ01]; European Research Council FP7 Project SUNNY [313243]SCI(E)ARTICLE74157-41675
Fine-Grained Object Recognition and Zero-Shot Learning in Remote Sensing Imagery
Fine-grained object recognition that aims to identify the type of an object
among a large number of subcategories is an emerging application with the
increasing resolution that exposes new details in image data. Traditional fully
supervised algorithms fail to handle this problem where there is low
between-class variance and high within-class variance for the classes of
interest with small sample sizes. We study an even more extreme scenario named
zero-shot learning (ZSL) in which no training example exists for some of the
classes. ZSL aims to build a recognition model for new unseen categories by
relating them to seen classes that were previously learned. We establish this
relation by learning a compatibility function between image features extracted
via a convolutional neural network and auxiliary information that describes the
semantics of the classes of interest by using training samples from the seen
classes. Then, we show how knowledge transfer can be performed for the unseen
classes by maximizing this function during inference. We introduce a new data
set that contains 40 different types of street trees in 1-ft spatial resolution
aerial data, and evaluate the performance of this model with manually annotated
attributes, a natural language model, and a scientific taxonomy as auxiliary
information. The experiments show that the proposed model achieves 14.3%
recognition accuracy for the classes with no training examples, which is
significantly better than a random guess accuracy of 6.3% for 16 test classes,
and three other ZSL algorithms.Comment: G. Sumbul, R. G. Cinbis, S. Aksoy, "Fine-Grained Object Recognition
and Zero-Shot Learning in Remote Sensing Imagery", IEEE Transactions on
Geoscience and Remote Sensing (TGRS), in press, 201
GeoChat: Grounded Large Vision-Language Model for Remote Sensing
Recent advancements in Large Vision-Language Models (VLMs) have shown great
promise in natural image domains, allowing users to hold a dialogue about given
visual content. However, such general-domain VLMs perform poorly for Remote
Sensing (RS) scenarios, leading to inaccurate or fabricated information when
presented with RS domain-specific queries. Such a behavior emerges due to the
unique challenges introduced by RS imagery. For example, to handle
high-resolution RS imagery with diverse scale changes across categories and
many small objects, region-level reasoning is necessary alongside holistic
scene interpretation. Furthermore, the lack of domain-specific multimodal
instruction following data as well as strong backbone models for RS make it
hard for the models to align their behavior with user queries. To address these
limitations, we propose GeoChat - the first versatile remote sensing VLM that
offers multitask conversational capabilities with high-resolution RS images.
Specifically, GeoChat can not only answer image-level queries but also accepts
region inputs to hold region-specific dialogue. Furthermore, it can visually
ground objects in its responses by referring to their spatial coordinates. To
address the lack of domain-specific datasets, we generate a novel RS multimodal
instruction-following dataset by extending image-text pairs from existing
diverse RS datasets. We establish a comprehensive benchmark for RS multitask
conversations and compare with a number of baseline methods. GeoChat
demonstrates robust zero-shot performance on various RS tasks, e.g., image and
region captioning, visual question answering, scene classification, visually
grounded conversations and referring detection. Our code is available at
https://github.com/mbzuai-oryx/geochat.Comment: 10 pages, 4 figure
Modeling, Simulation, and Analysis of Optical Remote Sensing Systems
Remote Sensing of the Earth\u27s resources from space-based sensors has evolved in the past twenty years from a scientific experiment to a commonly used technological tool. The scientific applications and engineering aspects of remote sensing systems have been studied extensively. However, most of these studies have been aimed at understanding individual aspects of the remote sensing process while relatively few have studied their interrelations. A motivation for studying these interrelationships has arisen with the advent of highly sophisticated configurable sensors as part of the Earth Observing System (EOS) proposed by NASA for the 1990\u27s. These instruments represent a tremendous advance in sensor technology with data gathered In nearly 200 spectral bands, and with the ability for scientists to specify many observational parameters. It will be increasingly necessary for users of remote sensing systems to understand the tradeoffs and interrelationships of system parameters. In this report, two approaches to investigating remote sensing systems are developed. In one approach, detailed models of the scene, the sensor, and the processing aspects of the system are implemented In a discrete simulation, This approach is useful in creating simulated images with desired characteristics for use in sensor or processing algorithm development. A less complete, but computationally simpler method based on a parametric model of the system is also developed. In this analytical model the various informational classes are parameterized by their spectral mean vector and covariance matrix. These Class statistics are modified by models for the atmosphere, the sensor, and processing algorithms and an estimate made of the resulting classification accuracy among the informational classes. Application of these models is made to the study of the proposed High Resolution Imaging Spectrometer (HIRIS).; The interrelationships among observational conditions, sensor effects, and processing choices are investigated with several interesting results. Reduced classification accuracy in hazy atmospheres is seen to be due not only to sensor noise, but also to the increased path radiance scattered from the surface. The effect of the atmosphere is also seen in its relationship to view angle. In clear atmospheres, increasing the zenith view angle is seen to result in an increase in classification accuracy due to the reduced scene variation as the ground size of image pixels is increased. However, in hazy atmospheres the reduced transmittance and increased path radiance counter this effect and result in decreased accuracy with increasing view angle. The relationship between the Signal-to:Noise Ratio (SNR) and classification accuracy is seen to depend in a complex manner on spatial parameters and feature selection. Higher SNR values are seen to hot always result in higher accuracies, and even in cases of low SNR feature sets chosen appropriately can lead to high accuracies
Unlocking the capabilities of explainable fewshot learning in remote sensing
Recent advancements have significantly improved the efficiency and
effectiveness of deep learning methods for imagebased remote sensing tasks.
However, the requirement for large amounts of labeled data can limit the
applicability of deep neural networks to existing remote sensing datasets. To
overcome this challenge, fewshot learning has emerged as a valuable approach
for enabling learning with limited data. While previous research has evaluated
the effectiveness of fewshot learning methods on satellite based datasets,
little attention has been paid to exploring the applications of these methods
to datasets obtained from UAVs, which are increasingly used in remote sensing
studies. In this review, we provide an up to date overview of both existing and
newly proposed fewshot classification techniques, along with appropriate
datasets that are used for both satellite based and UAV based data. Our
systematic approach demonstrates that fewshot learning can effectively adapt to
the broader and more diverse perspectives that UAVbased platforms can provide.
We also evaluate some SOTA fewshot approaches on a UAV disaster scene
classification dataset, yielding promising results. We emphasize the importance
of integrating XAI techniques like attention maps and prototype analysis to
increase the transparency, accountability, and trustworthiness of fewshot
models for remote sensing. Key challenges and future research directions are
identified, including tailored fewshot methods for UAVs, extending to unseen
tasks like segmentation, and developing optimized XAI techniques suited for
fewshot remote sensing problems. This review aims to provide researchers and
practitioners with an improved understanding of fewshot learnings capabilities
and limitations in remote sensing, while highlighting open problems to guide
future progress in efficient, reliable, and interpretable fewshot methods.Comment: Under review, once the paper is accepted, the copyright will be
transferred to the corresponding journa
A Review of Landcover Classification with Very-High Resolution Remotely Sensed Optical Images—Analysis Unit, Model Scalability and Transferability
As an important application in remote sensing, landcover classification remains one of the most challenging tasks in very-high-resolution (VHR) image analysis. As the rapidly increasing number of Deep Learning (DL) based landcover methods and training strategies are claimed to be the state-of-the-art, the already fragmented technical landscape of landcover mapping methods has been further complicated. Although there exists a plethora of literature review work attempting to guide researchers in making an informed choice of landcover mapping methods, the articles either focus on the review of applications in a specific area or revolve around general deep learning models, which lack a systematic view of the ever advancing landcover mapping methods. In addition, issues related to training samples and model transferability have become more critical than ever in an era dominated by data-driven approaches, but these issues were addressed to a lesser extent in previous review articles regarding remote sensing classification. Therefore, in this paper, we present a systematic overview of existing methods by starting from learning methods and varying basic analysis units for landcover mapping tasks, to challenges and solutions on three aspects of scalability and transferability with a remote sensing classification focus including (1) sparsity and imbalance of data; (2) domain gaps across different geographical regions; and (3) multi-source and multi-view fusion. We discuss in detail each of these categorical methods and draw concluding remarks in these developments and recommend potential directions for the continued endeavor
A Review of Landcover Classification with Very-High Resolution Remotely Sensed Optical Images—Analysis Unit, Model Scalability and Transferability
As an important application in remote sensing, landcover classification remains one of the most challenging tasks in very-high-resolution (VHR) image analysis. As the rapidly increasing number of Deep Learning (DL) based landcover methods and training strategies are claimed to be the state-of-the-art, the already fragmented technical landscape of landcover mapping methods has been further complicated. Although there exists a plethora of literature review work attempting to guide researchers in making an informed choice of landcover mapping methods, the articles either focus on the review of applications in a specific area or revolve around general deep learning models, which lack a systematic view of the ever advancing landcover mapping methods. In addition, issues related to training samples and model transferability have become more critical than ever in an era dominated by data-driven approaches, but these issues were addressed to a lesser extent in previous review articles regarding remote sensing classification. Therefore, in this paper, we present a systematic overview of existing methods by starting from learning methods and varying basic analysis units for landcover mapping tasks, to challenges and solutions on three aspects of scalability and transferability with a remote sensing classification focus including (1) sparsity and imbalance of data; (2) domain gaps across different geographical regions; and (3) multi-source and multi-view fusion. We discuss in detail each of these categorical methods and draw concluding remarks in these developments and recommend potential directions for the continued endeavor
Low-Shot Learning for the Semantic Segmentation of Remote Sensing Imagery
Deep-learning frameworks have made remarkable progress thanks to the creation of large annotated datasets such as ImageNet, which has over one million training images. Although this works well for color (RGB) imagery, labeled datasets for other sensor modalities (e.g., multispectral and hyperspectral) are minuscule in comparison. This is because annotated datasets are expensive and man-power intensive to complete; and since this would be impractical to accomplish for each type of sensor, current state-of-the-art approaches in computer vision are not ideal for remote sensing problems. The shortage of annotated remote sensing imagery beyond the visual spectrum has forced researchers to embrace unsupervised feature extracting frameworks. These features are learned on a per-image basis, so they tend to not generalize well across other datasets. In this dissertation, we propose three new strategies for learning feature extracting frameworks with only a small quantity of annotated image data; including 1) self-taught feature learning, 2) domain adaptation with synthetic imagery, and 3) semi-supervised classification. ``Self-taught\u27\u27 feature learning frameworks are trained with large quantities of unlabeled imagery, and then these networks extract spatial-spectral features from annotated data for supervised classification. Synthetic remote sensing imagery can be used to boot-strap a deep convolutional neural network, and then we can fine-tune the network with real imagery. Semi-supervised classifiers prevent overfitting by jointly optimizing the supervised classification task along side one or more unsupervised learning tasks (i.e., reconstruction). Although obtaining large quantities of annotated image data would be ideal, our work shows that we can make due with less cost-prohibitive methods which are more practical to the end-user
- …