25 research outputs found

    Semantic Labeling of High Resolution Images Using EfficientUNets and Transformers

    Full text link
    Semantic segmentation necessitates approaches that learn high-level characteristics while dealing with enormous amounts of data. Convolutional neural networks (CNNs) can learn unique and adaptive features to achieve this aim. However, due to the large size and high spatial resolution of remote sensing images, these networks cannot analyze an entire scene efficiently. Recently, deep transformers have proven their capability to record global interactions between different objects in the image. In this paper, we propose a new segmentation model that combines convolutional neural networks with transformers, and show that this mixture of local and global feature extraction techniques provides significant advantages in remote sensing segmentation. In addition, the proposed model includes two fusion layers that are designed to represent multi-modal inputs and output of the network efficiently. The input fusion layer extracts feature maps summarizing the relationship between image content and elevation maps (DSM). The output fusion layer uses a novel multi-task segmentation strategy where class labels are identified using class-specific feature extraction layers and loss functions. Finally, a fast-marching method is used to convert all unidentified class labels to their closest known neighbors. Our results demonstrate that the proposed methodology improves segmentation accuracy compared to state-of-the-art techniques

    Deep domain adaptation by weighted entropy minimization for the classification of aerial images

    Get PDF
    Fully convolutional neural networks (FCN) are successfully used for the automated pixel-wise classification of aerial images and possibly additional data. However, they require many labelled training samples to perform well. One approach addressing this issue is semi-supervised domain adaptation (SSDA). Here, labelled training samples from a source domain and unlabelled samples from a target domain are used jointly to obtain a target domain classifier, without requiring any labelled samples from the target domain. In this paper, a two-step approach for SSDA is proposed. The first step corresponds to a supervised training on the source domain, making use of strong data augmentation to increase the initial performance on the target domain. Secondly, the model is adapted by entropy minimization using a novel weighting strategy. The approach is evaluated on the basis of five domains, corresponding to five cities. Several training variants and adaptation scenarios are tested, indicating that proper data augmentation can already improve the initial target domain performance significantly resulting in an average overall accuracy of 77.5%. The weighted entropy minimization improves the overall accuracy on the target domains in 19 out of 20 scenarios on average by 1.8%. In all experiments a novel FCN architecture is used that yields results comparable to those of the best-performing models on the ISPRS labelling challenge while having an order of magnitude fewer parameters than commonly used FCNs. © 2020 Copernicus GmbH. All rights reserved

    Class-guided swin transformer for semantic segmentation of remote sensing imagery

    Get PDF
    Semantic segmentation of remote sensing images plays a crucial role in a wide variety of practical applications, including land cover mapping, environmental protection, and economic assessment. In the last decade, convolutional neural network (CNN) is the mainstream deep learning based method of semantic segmentation. Compared with conventional methods, CNN-based methods learn semantic features automatically, thereby achieving strong representation capability. However, the local receptive field of the convolution operation limits CNN-based methods from capturing global information. In contrast, Vision Transformer demonstrates its great potential in global information modelling and obtains superior results in semantic segmentation. Inspired by this, in this Letter, we propose a classguided Swin Transformer (CG-Swin) for semantic segmentation of remote sensing images. Specifically, we adopt a Transformerbased encoder-decoder structure, which introduces the Swin Transformer backbone as the encoder and designs a class-guided Transformer block to construct the decoder. The experimental results on ISPRS Vaihingen and Potsdam datasets demonstrate the significant breakthrough of the proposed method over ten benchmarks, outperform both advanced CNN-based and recent Vision Transformers based approaches

    Imbalance Knowledge-Driven Multi-modal Network for Land-Cover Semantic Segmentation Using Images and LiDAR Point Clouds

    Full text link
    Despite the good results that have been achieved in unimodal segmentation, the inherent limitations of individual data increase the difficulty of achieving breakthroughs in performance. For that reason, multi-modal learning is increasingly being explored within the field of remote sensing. The present multi-modal methods usually map high-dimensional features to low-dimensional spaces as a preprocess before feature extraction to address the nonnegligible domain gap, which inevitably leads to information loss. To address this issue, in this paper we present our novel Imbalance Knowledge-Driven Multi-modal Network (IKD-Net) to extract features from raw multi-modal heterogeneous data directly. IKD-Net is capable of mining imbalance information across modalities while utilizing a strong modal to drive the feature map refinement of the weaker ones in the global and categorical perspectives by way of two sophisticated plug-and-play modules: the Global Knowledge-Guided (GKG) and Class Knowledge-Guided (CKG) gated modules. The whole network then is optimized using a holistic loss function. While we were developing IKD-Net, we also established a new dataset called the National Agriculture Imagery Program and 3D Elevation Program Combined dataset in California (N3C-California), which provides a particular benchmark for multi-modal joint segmentation tasks. In our experiments, IKD-Net outperformed the benchmarks and state-of-the-art methods both in the N3C-California and the small-scale ISPRS Vaihingen dataset. IKD-Net has been ranked first on the real-time leaderboard for the GRSS DFC 2018 challenge evaluation until this paper's submission

    Towards Open-Set Semantic Segmentation of Aerial Images

    Get PDF
    Classical and more recently deep computer vision methods are optimized for visible spectrum images, commonly encoded in grayscale or RGB colorspaces acquired from smartphones or cameras. A more uncommon source of images exploited in the remote sensing field are satellite and aerial images. However the development of pattern recognition approaches for these data is relatively recent, mainly due to the limited availability of this type of images, as until recently they were used exclusively for military purposes. Access to aerial imagery, including spectral information, has been increasing mainly due to the low cost of drones, cheapening of imaging satellite launch costs, and novel public datasets. Usually remote sensing applications employ computer vision techniques strictly modeled for classification tasks in closed set scenarios. However, real-world tasks rarely fit into closed set contexts, frequently presenting previously unknown classes, characterizing them as open set scenarios. Focusing on this problem, this is the first paper to study and develop semantic segmentation techniques for open set scenarios applied to remote sensing images. The main contributions of this paper are: 1) a discussion of related works in open set semantic segmentation, showing evidence that these techniques can be adapted for open set remote sensing tasks; 2) the development and evaluation of a novel approach for open set semantic segmentation. Our method yielded competitive results when compared to closed set methods for the same dataset

    Paving the Way for Automatic Mapping of Rural Roads in the Amazon Rainforest

    Get PDF
    Output Status: Forthcomin

    Semantic segmentation of tree-canopy in urban environment with pixel-wise deep learning

    Get PDF
    Urban forests are an important part of any city, given that they provide several environmental benefits, such as improving urban drainage, climate regulation, public health, biodiversity, and others. However, tree detection in cities is challenging, given the irregular shape, size, occlusion, and complexity of urban areas. With the advance of environmental technologies, deep learning segmentation mapping methods can map urban forests accurately. We applied a region-based CNN object instance segmentation algorithm for the semantic segmentation of tree canopies in urban environments based on aerial RGB imagery. To the best of our knowledge, no study investigated the performance of deep learning-based methods for segmentation tasks inside the Cerrado biome, specifically for urban tree segmentation. Five state-of-the-art architectures were evaluated, namely: Fully Convolutional Network; U-Net; SegNet; Dynamic Dilated Convolution Network and DeepLabV3+. The experimental analysis showed the effectiveness of these methods reporting results such as pixel accuracy of 96,35%, an average accuracy of 91.25%, F1-score of 91.40%, Kappa of 82.80% and IoU of 73.89%. We also determined the inference time needed per area, and the deep learning methods investigated after the training proved to be suitable to solve this task, providing fast and effective solutions with inference time varying from 0.042 to 0.153 minutes per hectare. We conclude that the semantic segmentation of trees inside urban environments is highly achievable with deep neural networks. This information could be of high importance to decision-making and may contribute to the management of urban systems. It should be also important to mention that the dataset used in this work is available on our website

    Prototypical Contrastive Network for Imbalanced Aerial Image Segmentation

    Get PDF
    Binary segmentation is the main task underpinning several remote sensing applications, which are particularly interested in identifying and monitoring a specific cate-gory/object. Although extremely important, such a task has several challenges, including huge intra-class variance for the background and data imbalance. Furthermore, most works tackling this task partially or completely ignore one or both of these challenges and their developments. In this paper, we propose a novel method to perform imbal-anced binary segmentation of remote sensing images based on deep networks, prototypes, and contrastive loss. The proposed approach allows the model to focus on learning the foreground class while alleviating the class imbalance problem by allowing it to concentrate on the most difficult background examples. The results demonstrate that the proposed method outperforms state-of-the-art techniques for imbalanced binary segmentation of remote sensing images while taking much less training time.Output Status: Forthcomin
    corecore