2,261 research outputs found
Learning to Holistically Detect Bridges from Large-Size VHR Remote Sensing Imagery
Bridge detection in remote sensing images (RSIs) plays a crucial role in
various applications, but it poses unique challenges compared to the detection
of other objects. In RSIs, bridges exhibit considerable variations in terms of
their spatial scales and aspect ratios. Therefore, to ensure the visibility and
integrity of bridges, it is essential to perform holistic bridge detection in
large-size very-high-resolution (VHR) RSIs. However, the lack of datasets with
large-size VHR RSIs limits the deep learning algorithms' performance on bridge
detection. Due to the limitation of GPU memory in tackling large-size images,
deep learning-based object detection methods commonly adopt the cropping
strategy, which inevitably results in label fragmentation and discontinuous
prediction. To ameliorate the scarcity of datasets, this paper proposes a
large-scale dataset named GLH-Bridge comprising 6,000 VHR RSIs sampled from
diverse geographic locations across the globe. These images encompass a wide
range of sizes, varying from 2,048*2,048 to 16,38*16,384 pixels, and
collectively feature 59,737 bridges. Furthermore, we present an efficient
network for holistic bridge detection (HBD-Net) in large-size RSIs. The HBD-Net
presents a separate detector-based feature fusion (SDFF) architecture and is
optimized via a shape-sensitive sample re-weighting (SSRW) strategy. Based on
the proposed GLH-Bridge dataset, we establish a bridge detection benchmark
including the OBB and HBB tasks, and validate the effectiveness of the proposed
HBD-Net. Additionally, cross-dataset generalization experiments on two publicly
available datasets illustrate the strong generalization capability of the
GLH-Bridge dataset.Comment: 16 pages, 11 figures, 6 tables; due to the limitation "The abstract
field cannot be longer than 1,920 characters", the abstract appearing here is
slightly shorter than that in the PDF fil
Learning to Evaluate Performance of Multi-modal Semantic Localization
Semantic localization (SeLo) refers to the task of obtaining the most
relevant locations in large-scale remote sensing (RS) images using semantic
information such as text. As an emerging task based on cross-modal retrieval,
SeLo achieves semantic-level retrieval with only caption-level annotation,
which demonstrates its great potential in unifying downstream tasks. Although
SeLo has been carried out successively, but there is currently no work has
systematically explores and analyzes this urgent direction. In this paper, we
thoroughly study this field and provide a complete benchmark in terms of
metrics and testdata to advance the SeLo task. Firstly, based on the
characteristics of this task, we propose multiple discriminative evaluation
metrics to quantify the performance of the SeLo task. The devised significant
area proportion, attention shift distance, and discrete attention distance are
utilized to evaluate the generated SeLo map from pixel-level and region-level.
Next, to provide standard evaluation data for the SeLo task, we contribute a
diverse, multi-semantic, multi-objective Semantic Localization Testset
(AIR-SLT). AIR-SLT consists of 22 large-scale RS images and 59 test cases with
different semantics, which aims to provide a comprehensive evaluations for
retrieval models. Finally, we analyze the SeLo performance of RS cross-modal
retrieval models in detail, explore the impact of different variables on this
task, and provide a complete benchmark for the SeLo task. We have also
established a new paradigm for RS referring expression comprehension, and
demonstrated the great advantage of SeLo in semantics through combining it with
tasks such as detection and road extraction. The proposed evaluation metrics,
semantic localization testsets, and corresponding scripts have been open to
access at github.com/xiaoyuan1996/SemanticLocalizationMetrics .Comment: 19 pages, 11 figure
Uncovering archaeological sites in airborne LiDAR data with data-centric artificial intelligence
Mapping potential archaeological sites using remote sensing and artificial intelligence can be an efficient tool to assist archaeologists during project planning and fieldwork. This paper explores the use of airborne LiDAR data and data-centric artificial intelligence for identifying potential burial mounds. The challenge of exploring the landscape and mapping new archaeological sites, coupled with the difficulty of identifying them through visual analysis of remote sensing data, results in the recurring issue of insufficient annotations. Additionally, the top-down nature of LiDAR data hinders artificial intelligence in its search, as the morphology of archaeological sites blends with the morphology of natural and artificial shapes, leading to a frequent occurrence of false positives. To address this problem, a novel data-centric artificial intelligence approach is proposed, exploring the available data and tools. The LiDAR data is pre-processed into a dataset of 2D digital elevation images, and the known burial mounds are annotated. This dataset is augmented with a copy-paste object embedding based on Location-Based Ranking. This technique uses the Land-Use and Occupation Charter to segment the regions of interest, where burial mounds can be pasted. YOLOv5 is trained on the resulting dataset to propose new burial mounds. These proposals go through a post-processing step, directly using the 3D data acquired by the LiDAR to verify if its 3D shape is similar to the annotated sites. This approach drastically reduced false positives, attaining a 72.53% positive rate, relevant for the ground-truthing phase where archaeologists visit the coordinates of proposed burial mounds to confirm their existence.This work was supported by the Project Odyssey: Platform for Automated Sensing in Archaeology Co-Financed by COMPETE 2020 and
Regional Operational Program Lisboa 2020 through Portugal 2020 and FEDER under Grant ALG-01-0247-FEDER-070150.info:eu-repo/semantics/publishedVersio
DOTA: A Large-scale Dataset for Object Detection in Aerial Images
Object detection is an important and challenging problem in computer vision.
Although the past decade has witnessed major advances in object detection in
natural scenes, such successes have been slow to aerial imagery, not only
because of the huge variation in the scale, orientation and shape of the object
instances on the earth's surface, but also due to the scarcity of
well-annotated datasets of objects in aerial scenes. To advance object
detection research in Earth Vision, also known as Earth Observation and Remote
Sensing, we introduce a large-scale Dataset for Object deTection in Aerial
images (DOTA). To this end, we collect aerial images from different
sensors and platforms. Each image is of the size about 4000-by-4000 pixels and
contains objects exhibiting a wide variety of scales, orientations, and shapes.
These DOTA images are then annotated by experts in aerial image interpretation
using common object categories. The fully annotated DOTA images contains
instances, each of which is labeled by an arbitrary (8 d.o.f.)
quadrilateral To build a baseline for object detection in Earth Vision, we
evaluate state-of-the-art object detection algorithms on DOTA. Experiments
demonstrate that DOTA well represents real Earth Vision applications and are
quite challenging.Comment: Accepted to CVPR 201
Zero-Shot Aerial Object Detection with Visual Description Regularization
Existing object detection models are mainly trained on large-scale labeled
datasets. However, annotating data for novel aerial object classes is expensive
since it is time-consuming and may require expert knowledge. Thus, it is
desirable to study label-efficient object detection methods on aerial images.
In this work, we propose a zero-shot method for aerial object detection named
visual Description Regularization, or DescReg. Concretely, we identify the weak
semantic-visual correlation of the aerial objects and aim to address the
challenge with prior descriptions of their visual appearance. Instead of
directly encoding the descriptions into class embedding space which suffers
from the representation gap problem, we propose to infuse the prior inter-class
visual similarity conveyed in the descriptions into the embedding learning. The
infusion process is accomplished with a newly designed similarity-aware triplet
loss which incorporates structured regularization on the representation space.
We conduct extensive experiments with three challenging aerial object detection
datasets, including DIOR, xView, and DOTA. The results demonstrate that DescReg
significantly outperforms the state-of-the-art ZSD methods with complex
projection designs and generative frameworks, e.g., DescReg outperforms best
reported ZSD method on DIOR by 4.5 mAP on unseen classes and 8.1 in HM. We
further show the generalizability of DescReg by integrating it into generative
ZSD methods as well as varying the detection architecture.Comment: 13 pages, 3 figure
Automated High-resolution Earth Observation Image Interpretation: Outcome of the 2020 Gaofen Challenge
In this article, we introduce the 2020 Gaofen Challenge and relevant scientific outcomes. The 2020 Gaofen Challenge is an international competition, which is organized by the China High-Resolution Earth Observation Conference Committee and the Aerospace Information Research Institute, Chinese Academy of Sciences and technically cosponsored by the IEEE Geoscience and Remote Sensing Society and the International Society for Photogrammetry and Remote Sensing. It aims at promoting the academic development of automated high-resolution earth observation image interpretation. Six independent tracks have been organized in this challenge, which cover the challenging problems in the field of object detection and semantic segmentation. With the development of convolutional neural networks, deep-learning-based methods have achieved good performance on image interpretation. In this article, we report the details and the best-performing methods presented so far in the scope of this challenge
- …