61 research outputs found
PolyR-CNN:R-CNN for end-to-end polygonal building outline extraction
Polygonal building outline extraction has been a research focus in recent years. Most existing methods have addressed this challenging task by decomposing it into several subtasks and employing carefully designed architectures. Despite their accuracy, such pipelines often introduce inefficiencies during training and inference. This paper presents an end-to-end framework, denoted as PolyR-CNN, which offers an efficient and fully integrated approach to predict vectorized building polygons and bounding boxes directly from remotely sensed images. Notably, PolyR-CNN leverages solely the features of the Region of Interest (RoI) for the prediction, thereby mitigating the necessity for complex designs. Furthermore, we propose a novel scheme with PolyR-CNN to extract detailed outline information from polygon vertex coordinates, termed vertex proposal feature, to guide the RoI features to predict more regular buildings. PolyR-CNN demonstrates the capacity to deal with buildings with holes through a simple post-processing method on the Inria dataset. Comprehensive experiments conducted on the CrowdAI dataset show that PolyR-CNN achieves competitive accuracy compared to state-of-the-art methods while significantly improving computational efficiency, i.e., achieving 79.2 Average Precision (AP), exhibiting a 15.9 AP gain and operating 2.5 times faster and four times lighter than the well-established end-to-end method PolyWorld. Replacing the backbone with a simple ResNet-50, PolyR-CNN maintains a 71.1 AP while running four times faster than PolyWorld. The code is available at: https://github.com/HeinzJiao/PolyR-CNN.</p
Recurrent Multiresolution Convolutional Networks for VHR Image Classification
Classification of very high resolution (VHR) satellite images has three major
challenges: 1) inherent low intra-class and high inter-class spectral
similarities, 2) mismatching resolution of available bands, and 3) the need to
regularize noisy classification maps. Conventional methods have addressed these
challenges by adopting separate stages of image fusion, feature extraction, and
post-classification map regularization. These processing stages, however, are
not jointly optimizing the classification task at hand. In this study, we
propose a single-stage framework embedding the processing stages in a recurrent
multiresolution convolutional network trained in an end-to-end manner. The
feedforward version of the network, called FuseNet, aims to match the
resolution of the panchromatic and multispectral bands in a VHR image using
convolutional layers with corresponding downsampling and upsampling operations.
Contextual label information is incorporated into FuseNet by means of a
recurrent version called ReuseNet. We compared FuseNet and ReuseNet against the
use of separate processing steps for both image fusion, e.g. pansharpening and
resampling through interpolation, and map regularization such as conditional
random fields. We carried out our experiments on a land cover classification
task using a Worldview-03 image of Quezon City, Philippines and the ISPRS 2D
semantic labeling benchmark dataset of Vaihingen, Germany. FuseNet and ReuseNet
surpass the baseline approaches in both quantitative and qualitative results
Vectorizing Planar Roof Structure From Very High Resolution Remote Sensing Images Using Transformers
Grasping the roof structure of a building is a key part of building reconstruction. Directly predicting the geometric structure of the roof from a raster image to a vectorized representation, however, remains challenging. This paper introduces an efficient and accurate parsing method based upon a vision Transformer we dubbed Roof-Former. Our method consists of three steps: 1) Image encoder and edge node initialization, 2) Image feature fusion with an enhanced segmentation refinement branch, and 3) Edge filtering and structural reasoning. The vertex and edge heat map F1-scores have increased by 2.0% and 1.9% on the VWB dataset when compared to HEAT. Additionally, qualitative evaluations suggest that our method is superior to the current state-of-the-art. It indicates effectiveness for extracting global image information and maintaining the consistency and topological validity of the roof structure.</p
Globally scalable glacier mapping by deep learning matches expert delineation accuracy
Accurate global glacier mapping is critical for understanding climate change impacts. Despite its importance, automated glacier mapping at a global scale remains largely unexplored. Here we address this gap and propose Glacier-VisionTransformer-U-Net (GlaViTU), a convolutional-transformer deep learning model, and five strategies for multitemporal global-scale glacier mapping using open satellite imagery. Assessing the spatial, temporal and cross-sensor generalisation shows that our best strategy achieves intersection over union >0.85 on previously unobserved images in most cases, which drops to >0.75 for debris-rich areas such as High-Mountain Asia and increases to >0.90 for regions dominated by clean ice. A comparative validation against human expert uncertainties in terms of area and distance deviations underscores GlaViTU performance, approaching or matching expert-level delineation. Adding synthetic aperture radar data, namely, backscatter and interferometric coherence, increases the accuracy in all regions where available. The calibrated confidence for glacier extents is reported making the predictions more reliable and interpretable. We also release a benchmark dataset that covers 9% of glaciers worldwide. Our results support efforts towards automated multitemporal and global glacier mapping.</p
Mapping the Brazilian savanna’s natural vegetation:A SAR-optical uncertainty-aware deep learning approach
The Brazilian savanna (Cerrado) is considered a hotspot for conservation. Despite its environmental and social importance, the biome has suffered a rapid transformation process due to human activities. Mapping and monitoring the remaining vegetation is essential to guide public policies for biodiversity conservation. However, accurately mapping the Cerrado’s vegetation is still an open challenge. Its diverse but spectrally similar physiognomies are a source of confusion for state-of-the-art (SOTA) methods. This study proposes a deep learning model to map the natural vegetation of the Cerrado at the regional to biome level, fusing Synthetic Aperture Radar (SAR) and optical data. The proposed model is designed to deal with uncertainties caused by the different resolutions of the input Sentinel-1/2 images (10 m) and the reference data, derived from Landsat images (30 m). We designed a multi-resolution label-propagation (MRLP) module that infers maps at both resolutions and uses the class scores from the 30 m output as features for the 10 m classification layer. We train the model with the proposed calibrated dual focal loss function in a 2-stage hierarchical manner. Our results reached an overall accuracy of 70.37%, representing an increase of 15.64% compared to a SOTA random forest (RF) model. Moreover, we propose an uncertainty quantification method, which has shown to be useful not only in validating the model, but also in highlighting areas of label noise in the reference. The developed codes and dataset are available on Github
GLAVITU:A Hybrid CNN-Transformer for Multi-Regional Glacier Mapping from Multi-Source Data
Glacier mapping is essential for studying and monitoring the impacts of climate change. However, several challenges such as debris-covered ice and highly variable landscapes across glacierized regions worldwide complicate large-scale glacier mapping in a fully-automated manner. This work presents a novel hybrid CNN-transformer model (GlaViTU) for multi-regional glacier mapping. Our model outperforms three baseline models - SETR-B/16, ResU-Net and TransU-Net - achieving a higher mean IoU of 0.875 and demonstrates better generalization ability. The proposed model is also parameter-efficient, with approximately 10 and 3 times fewer parameters than SETR-B/16 and ResU-Net, respectively. Our results provide a solid foundation for future studies on the application of deep learning methods for global glacier mapping. To facilitate reproducibility, we have shared our data set, codebase and pretrained models on GitHub at https://github.com/konstantin-a-maslov/GlaViTU-IGARSS2023.</p
Special section guest editorial:Advanced spectral analysis techniques and remote sensing applications
Building polygon extraction from aerial images and digital surface models with a frame field learning framework
Deep learning-based models for building delineation from remotely sensed images face the challenge of producing precise and regular building outlines. This study investigates the combination of normalized digital surface models (nDSMs) with aerial images to optimize the extraction of building polygons using the frame field learning method. Results are evaluated at pixel, object, and polygon levels. In addition, an analysis is performed to assess the statistical deviations in the number of vertices of building polygons compared with the reference. The comparison of the number of vertices focuses on finding the output polygons that are the easiest to edit by human analysts in operational applications. It can serve as guidance to reduce the post-processing workload for obtaining high-accuracy building footprints. Experiments conducted in Enschede, the Netherlands, demonstrate that by introducing nDSM, the method could reduce the number of false positives and prevent missing the real buildings on the ground. The positional accuracy and shape similarity was improved, resulting in better-aligned building polygons. The method achieved a mean intersection over union (IoU) of 0.80 with the fused data (RGB + nDSM) against an IoU of 0.57 with the baseline (using RGB only) in the same area. A qualitative analysis of the results shows that the investigated model predicts more precise and regular polygons for large and complex structures
Investigating Sar-Optical Deep Learning Data Fusion to Map the Brazilian Cerrado Vegetation with Sentinel Data
Despite its environmental and societal importance, accurately mapping the Brazilian Cerrado's vegetation is still an open challenge. Its diverse but spectrally similar physiognomies are difficult to be identified and mapped by state-of-the-art methods from only medium-to high-resolution optical images. This work investigates the fusion of Synthetic Aperture Radar (SAR) and optical data in convolutional neural network architectures to map the Cerrado according to a 2-level class hierarchy. Additionally, the proposed model is designed to deal with uncertainties that are brought by the difference in resolution between the input images (at 10m) and the reference data (at 30m). We tested four data fusion strategies and showed that the position for the data combination is important for the network to learn better features.</p
- …