3,117 research outputs found

    Building extraction from high-resolution aerial imagery using a generative adversarial network with spatial and channel attention mechanisms.

    Get PDF
    Segmentation of high-resolution remote sensing images is an important challenge with wide practical applications. The increasing spatial resolution provides fine details for image segmentation but also incurs segmentation ambiguities. In this paper, we propose a generative adversarial network with spatial and channel attention mechanisms (GAN-SCA) for the robust segmentation of buildings in remote sensing images. The segmentation network (generator) of the proposed framework is composed of the well-known semantic segmentation architecture (U-Net) and the spatial and channel attention mechanisms (SCA). The adoption of SCA enables the segmentation network to selectively enhance more useful features in specific positions and channels and enables improved results closer to the ground truth. The discriminator is an adversarial network with channel attention mechanisms that can properly discriminate the outputs of the generator and the ground truth maps. The segmentation network and adversarial network are trained in an alternating fashion on the Inria aerial image labeling dataset and Massachusetts buildings dataset. Experimental results show that the proposed GAN-SCA achieves a higher score (the overall accuracy and intersection over the union of Inria aerial image labeling dataset are 96.61% and 77.75%, respectively, and the F1-measure of the Massachusetts buildings dataset is 96.36%) and outperforms several state-of-the-art approaches

    PiCoCo: Pixelwise Contrast and Consistency Learning for Semisupervised Building Footprint Segmentation

    Get PDF
    Building footprint segmentation from high-resolution remote sensing (RS) images plays a vital role in urban planning, disaster response, and population density estimation. Convolutional neural networks (CNNs) have been recently used as a workhorse for effectively generating building footprints. However, to completely exploit the prediction power of CNNs, large-scale pixel-level annotations are required. Most state-of-the-art methods based on CNNs are focused on the design of network architectures for improving the predictions of building footprints with full annotations, while few works have been done on building footprint segmentation with limited annotations. In this article, we propose a novel semisupervised learning method for building footprint segmentation, which can effectively predict building footprints based on the network trained with few annotations (e.g., only 0.0324 km2 out of 2.25-km2 area is labeled). The proposed method is based on investigating the contrast between the building and background pixels in latent space and the consistency of predictions obtained from the CNN models when the input RS images are perturbed. Thus, we term the proposed semisupervised learning framework of building footprint segmentation as PiCoCo, which is based on the enforcement of Pixelwise Contrast and Consistency during the learning phase. Our experiments, conducted on two benchmark building segmentation datasets, validate the effectiveness of our proposed framework as compared to several state-of-the-art building footprint extraction and semisupervised semantic segmentation methods

    Deep Learning-Based Building Footprint Extraction With Missing Annotations

    Get PDF
    Most state-of-the-art deep learning-based methods for extraction of building footprints are aimed at designing proper convolutional neural network (CNN) architectures or loss functions able to effectively predict building masks from remote sensing (RS) images. To properly train such CNN models, large-scale and pixel-level building annotations are required. One common approach to obtain scalable benchmark data sets for the segmentation of buildings is to register RS images with auxiliary geospatial information data, such as those available from OpenStreetMaps (OSM). However, due to land-cover changes, urban construction, and delayed geospatial information updating, some building annotations may be missing in the corresponding ground-truth building mask layers. This will likely introduce confusion in the training of CNN models for discriminating between background and building pixels. To solve this important issue, we first formulate the problem as a long-tailed classification one. Then, we introduce a new joint loss function based on three terms: 1) logit adjusted cross entropy (LACE) loss, aimed at discriminating between building and background pixels from a long-tailed label distribution; 2) weighted dice loss, aimed at increasing the F₁ scores of the predicted building masks; and 3) boundary (BD) alignment loss, which is optimized for preserving the fine-grained structure of building boundaries. Our experiments, conducted on two benchmark building segmentation data sets, validate the effectiveness of our newly proposed loss with respect to other state-of-the-art losses commonly used for extracting building footprints. The codes of this letter will be publicly available from https://github.com/jiankang1991/GRSL_BFE_MA

    Building Extraction from Very High Resolution Aerial Imagery Using Joint Attention Deep Neural Network

    Get PDF
    Automated methods to extract buildings from very high resolution (VHR) remote sensing data have many applications in a wide range of fields. Many convolutional neural network (CNN) based methods have been proposed and have achieved significant advances in the building extraction task. In order to refine predictions, a lot of recent approaches fuse features from earlier layers of CNNs to introduce abundant spatial information, which is known as skip connection. However, this strategy of reusing earlier features directly without processing could reduce the performance of the network. To address this problem, we propose a novel fully convolutional network (FCN) that adopts attention based re-weighting to extract buildings from aerial imagery. Specifically, we consider the semantic gap between features from different stages and leverage the attention mechanism to bridge the gap prior to the fusion of features. The inferred attention weights along spatial and channel-wise dimensions make the low level feature maps adaptive to high level feature maps in a target-oriented manner. Experimental results on three publicly available aerial imagery datasets show that the proposed model (RFA-UNet) achieves comparable and improved performance compared to other state-of-the-art models for building extraction

    Multi-task deep learning for large-scale building detail extraction from high-resolution satellite imagery

    Full text link
    Understanding urban dynamics and promoting sustainable development requires comprehensive insights about buildings. While geospatial artificial intelligence has advanced the extraction of such details from Earth observational data, existing methods often suffer from computational inefficiencies and inconsistencies when compiling unified building-related datasets for practical applications. To bridge this gap, we introduce the Multi-task Building Refiner (MT-BR), an adaptable neural network tailored for simultaneous extraction of spatial and attributional building details from high-resolution satellite imagery, exemplified by building rooftops, urban functional types, and roof architectural types. Notably, MT-BR can be fine-tuned to incorporate additional building details, extending its applicability. For large-scale applications, we devise a novel spatial sampling scheme that strategically selects limited but representative image samples. This process optimizes both the spatial distribution of samples and the urban environmental characteristics they contain, thus enhancing extraction effectiveness while curtailing data preparation expenditures. We further enhance MT-BR's predictive performance and generalization capabilities through the integration of advanced augmentation techniques. Our quantitative results highlight the efficacy of the proposed methods. Specifically, networks trained with datasets curated via our sampling method demonstrate improved predictive accuracy relative to those using alternative sampling approaches, with no alterations to network architecture. Moreover, MT-BR consistently outperforms other state-of-the-art methods in extracting building details across various metrics. The real-world practicality is also demonstrated in an application across Shanghai, generating a unified dataset that encompasses both the spatial and attributional details of buildings
    corecore