329 research outputs found

    Hybrid U-Net: Semantic Segmentation of High-Resolution Satellite Images to Detect War Destruction

    Get PDF
    Destruction caused by violent conflicts play a big role in understanding the dynamics and consequences of conflicts, which is now the focus of a large body of ongoing literature in economics and political science. However, existing data on conflict largely come from news or eyewitness reports, which makes it incomplete, potentially unreliable, and biased for ongoing conflicts. Using satellite images and deep learning techniques, we can automatically extract objective information on violent events. To automate this process, we created a dataset of high-resolution satellite images of Syria and manually annotated the destroyed areas pixel-wise. Then, we used this dataset to train and test semantic segmentation networks to detect building damage of various size. We specifically utilized a U-Net model for this task due to its promising performance on small and imbalanced datasets. However, the raw U-Net architecture does not fully exploit multi-scale feature maps, which are among the important factors for generating fine-grained segmentation maps, especially for high-resolution images. To address this deficiency, we propose a multi-scale feature fusion approach and design a multi-scale skip-connected Hybrid U-Net for segmenting high-resolution satellite images. In our experiments, U-Net and its variants demonstrated promising segmentation results to detect various war-related building destruction. In addition, Hybrid U-Net resulted in significant improvement in segmentation performance compared to U-Net and other baselines. In particular, the mean intersection over union and mean dice score improved by 7.05% and 8.09%, respectively, compared to those in the raw U-Net

    Multitemporal Relearning with Convolutional LSTM Models for Land Use Classification

    Get PDF
    In this article, we present a novel hybrid framework, which integrates spatial–temporal semantic segmentation with postclassification relearning, for multitemporal land use and land cover (LULC) classification based on very high resolution (VHR) satellite imagery. To efficiently obtain optimal multitemporal LULC classification maps, the hybrid framework utilizes a spatial–temporal semantic segmentation model to harness temporal dependency for extracting high-level spatial–temporal features. In addition, the principle of postclassification relearning is adopted to efficiently optimize model output. Thereby, the initial outcome of a semantic segmentation model is provided to a subsequent model via an extended input space to guide the learning of discriminative feature representations in an end-to-end fashion. Last, object-based voting is coupled with postclassification relearning for coping with the high intraclass and low interclass variances. The framework was tested with two different postclassification relearning strategies (i.e., pixel-based relearning and object-based relearning) and three convolutional neural network models, i.e., UNet, a simple Convolutional LSTM, and a UNet Convolutional-LSTM. The experiments were conducted on two datasets with LULC labels that contain rich semantic information and variant building morphologic features (e.g., informal settlements). Each dataset contains four time steps from WorldView-2 and Quickbird imagery. The experimental results unambiguously underline that the proposed framework is efficient in terms of classifying complex LULC maps with multitemporal VHR images

    Building Extraction from Very High Resolution Aerial Imagery Using Joint Attention Deep Neural Network

    Get PDF
    Automated methods to extract buildings from very high resolution (VHR) remote sensing data have many applications in a wide range of fields. Many convolutional neural network (CNN) based methods have been proposed and have achieved significant advances in the building extraction task. In order to refine predictions, a lot of recent approaches fuse features from earlier layers of CNNs to introduce abundant spatial information, which is known as skip connection. However, this strategy of reusing earlier features directly without processing could reduce the performance of the network. To address this problem, we propose a novel fully convolutional network (FCN) that adopts attention based re-weighting to extract buildings from aerial imagery. Specifically, we consider the semantic gap between features from different stages and leverage the attention mechanism to bridge the gap prior to the fusion of features. The inferred attention weights along spatial and channel-wise dimensions make the low level feature maps adaptive to high level feature maps in a target-oriented manner. Experimental results on three publicly available aerial imagery datasets show that the proposed model (RFA-UNet) achieves comparable and improved performance compared to other state-of-the-art models for building extraction

    A review of technical factors to consider when designing neural networks for semantic segmentation of Earth Observation imagery

    Full text link
    Semantic segmentation (classification) of Earth Observation imagery is a crucial task in remote sensing. This paper presents a comprehensive review of technical factors to consider when designing neural networks for this purpose. The review focuses on Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Generative Adversarial Networks (GANs), and transformer models, discussing prominent design patterns for these ANN families and their implications for semantic segmentation. Common pre-processing techniques for ensuring optimal data preparation are also covered. These include methods for image normalization and chipping, as well as strategies for addressing data imbalance in training samples, and techniques for overcoming limited data, including augmentation techniques, transfer learning, and domain adaptation. By encompassing both the technical aspects of neural network design and the data-related considerations, this review provides researchers and practitioners with a comprehensive and up-to-date understanding of the factors involved in designing effective neural networks for semantic segmentation of Earth Observation imagery.Comment: 145 pages with 32 figure

    Implicit Ray-Transformers for Multi-view Remote Sensing Image Segmentation

    Full text link
    The mainstream CNN-based remote sensing (RS) image semantic segmentation approaches typically rely on massive labeled training data. Such a paradigm struggles with the problem of RS multi-view scene segmentation with limited labeled views due to the lack of considering 3D information within the scene. In this paper, we propose ''Implicit Ray-Transformer (IRT)'' based on Implicit Neural Representation (INR), for RS scene semantic segmentation with sparse labels (such as 4-6 labels per 100 images). We explore a new way of introducing multi-view 3D structure priors to the task for accurate and view-consistent semantic segmentation. The proposed method includes a two-stage learning process. In the first stage, we optimize a neural field to encode the color and 3D structure of the remote sensing scene based on multi-view images. In the second stage, we design a Ray Transformer to leverage the relations between the neural field 3D features and 2D texture features for learning better semantic representations. Different from previous methods that only consider 3D prior or 2D features, we incorporate additional 2D texture information and 3D prior by broadcasting CNN features to different point features along the sampled ray. To verify the effectiveness of the proposed method, we construct a challenging dataset containing six synthetic sub-datasets collected from the Carla platform and three real sub-datasets from Google Maps. Experiments show that the proposed method outperforms the CNN-based methods and the state-of-the-art INR-based segmentation methods in quantitative and qualitative metrics

    HED-UNet: Combined Segmentation and Edge Detection for Monitoring the Antarctic Coastline

    Full text link
    Deep learning-based coastline detection algorithms have begun to outshine traditional statistical methods in recent years. However, they are usually trained only as single-purpose models to either segment land and water or delineate the coastline. In contrast to this, a human annotator will usually keep a mental map of both segmentation and delineation when performing manual coastline detection. To take into account this task duality, we therefore devise a new model to unite these two approaches in a deep learning model. By taking inspiration from the main building blocks of a semantic segmentation framework (UNet) and an edge detection framework (HED), both tasks are combined in a natural way. Training is made efficient by employing deep supervision on side predictions at multiple resolutions. Finally, a hierarchical attention mechanism is introduced to adaptively merge these multiscale predictions into the final model output. The advantages of this approach over other traditional and deep learning-based methods for coastline detection are demonstrated on a dataset of Sentinel-1 imagery covering parts of the Antarctic coast, where coastline detection is notoriously difficult. An implementation of our method is available at \url{https://github.com/khdlr/HED-UNet}.Comment: This work has been accepted by IEEE TGRS for publication. Copyright may be transferred without notice, after which this version may no longer be accessibl

    Efficient Semantic Segmentation on Edge Devices

    Full text link
    Semantic segmentation works on the computer vision algorithm for assigning each pixel of an image into a class. The task of semantic segmentation should be performed with both accuracy and efficiency. Most of the existing deep FCNs yield to heavy computations and these networks are very power hungry, unsuitable for real-time applications on portable devices. This project analyzes current semantic segmentation models to explore the feasibility of applying these models for emergency response during catastrophic events. We compare the performance of real-time semantic segmentation models with non-real-time counterparts constrained by aerial images under oppositional settings. Furthermore, we train several models on the Flood-Net dataset, containing UAV images captured after Hurricane Harvey, and benchmark their execution on special classes such as flooded buildings vs. non-flooded buildings or flooded roads vs. non-flooded roads. In this project, we developed a real-time UNet based model and deployed that network on Jetson AGX Xavier module

    HFRAS : design of a high-density feature representation model for effective augmentation of satellite images

    Get PDF
    Efficiently extracting features from satellite images is crucial for classification and post-processing activities. Many feature representation models have been created for this purpose. However, most of them either increase computational complexity or decrease classification efficiency. The proposed model in this paper initially collects a set of available satellite images and represents them via a hybrid of long short-term memory (LSTM) and gated recurrent unit (GRU) features. These features are processed via an iterative genetic algorithm, identifying optimal augmentation methods for the extracted feature sets. To analyse the efficiency of this optimization process, we model an iterative fitness function that assists in incrementally improving the classification process. The fitness function uses an accuracy & precision-based feedback mechanism, which helps in tuning the hyperparameters of the proposed LSTM & GRU feature extraction process. The suggested model used 100 k images, 60% allocated for training and 20% each designated for validation and testing purposes. The proposed model can increase classification precision by 16.1% and accuracy by 17.1% compared to conventional augmentation strategies. The model also showcased incremental accuracy enhancements for an increasing number of training image sets.© The Author(s) 2023. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.fi=vertaisarvioitu|en=peerReviewed

    Graph Information Bottleneck for Remote Sensing Segmentation

    Full text link
    Remote sensing segmentation has a wide range of applications in environmental protection, and urban change detection, etc. Despite the success of deep learning-based remote sensing segmentation methods (e.g., CNN and Transformer), they are not flexible enough to model irregular objects. In addition, existing graph contrastive learning methods usually adopt the way of maximizing mutual information to keep the node representations consistent between different graph views, which may cause the model to learn task-independent redundant information. To tackle the above problems, this paper treats images as graph structures and introduces a simple contrastive vision GNN (SC-ViG) architecture for remote sensing segmentation. Specifically, we construct a node-masked and edge-masked graph view to obtain an optimal graph structure representation, which can adaptively learn whether to mask nodes and edges. Furthermore, this paper innovatively introduces information bottleneck theory into graph contrastive learning to maximize task-related information while minimizing task-independent redundant information. Finally, we replace the convolutional module in UNet with the SC-ViG module to complete the segmentation and classification tasks of remote sensing images. Extensive experiments on publicly available real datasets demonstrate that our method outperforms state-of-the-art remote sensing image segmentation methods.Comment: 13 pages, 6 figure
    • …
    corecore