159 research outputs found
BuildMapper: A Fully Learnable Framework for Vectorized Building Contour Extraction
Deep learning based methods have significantly boosted the study of automatic
building extraction from remote sensing images. However, delineating vectorized
and regular building contours like a human does remains very challenging, due
to the difficulty of the methodology, the diversity of building structures, and
the imperfect imaging conditions. In this paper, we propose the first
end-to-end learnable building contour extraction framework, named BuildMapper,
which can directly and efficiently delineate building polygons just as a human
does. BuildMapper consists of two main components: 1) a contour initialization
module that generates initial building contours; and 2) a contour evolution
module that performs both contour vertex deformation and reduction, which
removes the need for complex empirical post-processing used in existing
methods. In both components, we provide new ideas, including a learnable
contour initialization method to replace the empirical methods, dynamic
predicted and ground truth vertex pairing for the static vertex correspondence
problem, and a lightweight encoder for vertex information extraction and
aggregation, which benefit a general contour-based method; and a well-designed
vertex classification head for building corner vertices detection, which casts
light on direct structured building contour extraction. We also built a
suitable large-scale building dataset, the WHU-Mix (vector) building dataset,
to benefit the study of contour-based building extraction methods. The
extensive experiments conducted on the WHU-Mix (vector) dataset, the WHU
dataset, and the CrowdAI dataset verified that BuildMapper can achieve a
state-of-the-art performance, with a higher mask average precision (AP) and
boundary AP than both segmentation-based and contour-based methods
Using Deep Neural Networks for Automatic Building Extraction with Boundary Regularization from Satellite Images
The building footprints from satellite images play a significant role in massive applications and many demand footprints with regularized boundaries, which are challenging to acquire. Recently, deep learning has made remarkable accomplishments in the remote sensing community. In this study, we formulate the major problems into spatial learning, semantic learning and geometric learning and propose a deep learning based framework to accomplish the building footprint extraction with boundary regularization. Our first two models, Post-Shape and Binary Space Partitioning Pooling Network (BSPPN) integrate polygon shape-prior into neural networks. The other one, Region-based Polygon GCN (R-PolyGCN) exploits graph convolutional networks to learn geometric polygon features. Extensive experiments show that our models can properly achieve object localization, recognition, semantic labeling and geometric shape extraction simultaneously. The model performances are competitive with the state-of-the-art baseline model, Mask R-CNN. Especially our R-PolyGCN, consistently outperforms others in all aspects
Superpixel-Based Attention Graph Neural Network for Semantic Segmentation in Aerial Images
Semantic segmentation is one of the significant tasks in understanding aerial images with high spatial resolution. Recently, Graph Neural Network (GNN) and attention mechanism have achieved excellent performance in semantic segmentation tasks in general images and been applied to aerial images. In this paper, we propose a novel Superpixel-based Attention Graph Neural Network (SAGNN) for semantic segmentation of high spatial resolution aerial images. A K-Nearest Neighbor (KNN) graph is constructed from our network for each image, where each node corresponds to a superpixel in the image and is associated with a hidden representation vector. On this basis, the initialization of the hidden representation vector is the appearance feature extracted by a unary Convolutional Neural Network (CNN) from the image. Moreover, relying on the attention mechanism and recursive functions, each node can update its hidden representation according to the current state and the incoming information from its neighbors. The final representation of each node is used to predict the semantic class of each superpixel. The attention mechanism enables graph nodes to differentially aggregate neighbor information, which can extract higher-quality features. Furthermore, the superpixels not only save computational resources, but also maintain object boundary to achieve more accurate predictions. The accuracy of our model on the Potsdam and Vaihingen public datasets exceeds all benchmark approaches, reaching 90.23% and 89.32%, respectively
- …