80 research outputs found

    Semantic Labeling of High Resolution Images Using EfficientUNets and Transformers

    Full text link
    Semantic segmentation necessitates approaches that learn high-level characteristics while dealing with enormous amounts of data. Convolutional neural networks (CNNs) can learn unique and adaptive features to achieve this aim. However, due to the large size and high spatial resolution of remote sensing images, these networks cannot analyze an entire scene efficiently. Recently, deep transformers have proven their capability to record global interactions between different objects in the image. In this paper, we propose a new segmentation model that combines convolutional neural networks with transformers, and show that this mixture of local and global feature extraction techniques provides significant advantages in remote sensing segmentation. In addition, the proposed model includes two fusion layers that are designed to represent multi-modal inputs and output of the network efficiently. The input fusion layer extracts feature maps summarizing the relationship between image content and elevation maps (DSM). The output fusion layer uses a novel multi-task segmentation strategy where class labels are identified using class-specific feature extraction layers and loss functions. Finally, a fast-marching method is used to convert all unidentified class labels to their closest known neighbors. Our results demonstrate that the proposed methodology improves segmentation accuracy compared to state-of-the-art techniques

    BenchCloudVision: A Benchmark Analysis of Deep Learning Approaches for Cloud Detection and Segmentation in Remote Sensing Imagery

    Full text link
    Satellites equipped with optical sensors capture high-resolution imagery, providing valuable insights into various environmental phenomena. In recent years, there has been a surge of research focused on addressing some challenges in remote sensing, ranging from water detection in diverse landscapes to the segmentation of mountainous and terrains. Ongoing investigations goals to enhance the precision and efficiency of satellite imagery analysis. Especially, there is a growing emphasis on developing methodologies for accurate water body detection, snow and clouds, important for environmental monitoring, resource management, and disaster response. Within this context, this paper focus on the cloud segmentation from remote sensing imagery. Accurate remote sensing data analysis can be challenging due to the presence of clouds in optical sensor-based applications. The quality of resulting products such as applications and research is directly impacted by cloud detection, which plays a key role in the remote sensing data processing pipeline. This paper examines seven cutting-edge semantic segmentation and detection algorithms applied to clouds identification, conducting a benchmark analysis to evaluate their architectural approaches and identify the most performing ones. To increase the model's adaptability, critical elements including the type of imagery and the amount of spectral bands used during training are analyzed. Additionally, this research tries to produce machine learning algorithms that can perform cloud segmentation using only a few spectral bands, including RGB and RGBN-IR combinations. The model's flexibility for a variety of applications and user scenarios is assessed by using imagery from Sentinel-2 and Landsat-8 as datasets. This benchmark can be reproduced using the material from this github link: https://github.com/toelt-llc/cloud_segmentation_comparative.Comment: Submitted to Expert Systems and Applications. Under license CC-BY-NC-N

    Building Extraction from Very High Resolution Aerial Imagery Using Joint Attention Deep Neural Network

    Get PDF
    Automated methods to extract buildings from very high resolution (VHR) remote sensing data have many applications in a wide range of fields. Many convolutional neural network (CNN) based methods have been proposed and have achieved significant advances in the building extraction task. In order to refine predictions, a lot of recent approaches fuse features from earlier layers of CNNs to introduce abundant spatial information, which is known as skip connection. However, this strategy of reusing earlier features directly without processing could reduce the performance of the network. To address this problem, we propose a novel fully convolutional network (FCN) that adopts attention based re-weighting to extract buildings from aerial imagery. Specifically, we consider the semantic gap between features from different stages and leverage the attention mechanism to bridge the gap prior to the fusion of features. The inferred attention weights along spatial and channel-wise dimensions make the low level feature maps adaptive to high level feature maps in a target-oriented manner. Experimental results on three publicly available aerial imagery datasets show that the proposed model (RFA-UNet) achieves comparable and improved performance compared to other state-of-the-art models for building extraction

    Exploration of Convolutional Neural Network Architectures for Large Region Map Automation

    Full text link
    Deep learning semantic segmentation algorithms have provided improved frameworks for the automated production of Land-Use and Land-Cover (LULC) maps, which significantly increases the frequency of map generation as well as consistency of production quality. In this research, a total of 28 different model variations were examined to improve the accuracy of LULC maps. The experiments were carried out using Landsat 5/7 or Landsat 8 satellite images with the North American Land Change Monitoring System labels. The performance of various CNNs and extension combinations were assessed, where VGGNet with an output stride of 4, and modified U-Net architecture provided the best results. Additional expanded analysis of the generated LULC maps was also provided. Using a deep neural network, this work achieved 92.4% accuracy for 13 LULC classes within southern Manitoba representing a 15.8% improvement over published results for the NALCMS. Based on the large regions of interest, higher radiometric resolution of Landsat 8 data resulted in better overall accuracies (88.04%) compare to Landsat 5/7 (80.66%) for 16 LULC classes. This represents an 11.44% and 4.06% increase in overall accuracy compared to previously published NALCMS results, including larger land area and higher number of LULC classes incorporated into the models compared to other published LULC map automation methods

    Characterization of wastewater methane emission sources with computer vision and remote sensing

    Get PDF
    Methane emissions are responsible for at least one-third of the total anthropogenic climate forcing and current estimations expect a significant increase in these emissions in the next decade. Consequently, methane offers a unique opportunity to mitigate climate change while addressing energy supply problems. From the five primary methane sources, residual water treatment provided 7% of the emissions in 2010. This ratio will undoubtedly increase with global population growth. Therefore, locating sources of methane emissions is a crucial step in characterizing the current distribution of GHG better. Nevertheless, there is a lack of comprehensive global and uniform databases to bind those emissions to concrete sources and there is no automatic method to accurately locate sparse human infrastructures such as wastewater treatment plants (WWTPs). WWTP detection is an open problem posing many obstacles due to the lack of freely accessible high-resolution imagery, and the variety of real-world morphologies and sizes. In this work, we tackle this state-of-the-art complex problem and go one step forward by trying to infer capacity using one end-to-end Deep Learning architecture and multi-modal remote sensing data. This goal has a groundbreaking potential impact, as it could help estimate mapped methane emissions for improving emission inventories and future scenarios prediction. We will address the problem as a combination of two parallel inference exercises by proposing a novel network to combine multimodal data based on the hypothesis that the location and the capacity can be inferred based on characteristics such as the plant situation, size, morphology, and proximity to water bodies or population centers. We explore technical documentation and literature to develop these hypotheses and validate their soundness with data analysis. To validate the architecture and the hypotheses, we develop a model and a dataset in parallel with a series of ablation tests. The process is facilitated by an automatic pipeline, also developed in this work, to create datasets and validate models leveraging those datasets. We test the best-obtained model at scale on a mosaic composed of satellite imagery covering the region of Catalonia. The goal is to find plants not previously labeled but present in wastewater treatment plant (WWTP) databases and to compare the distribution and magnitude of the inferred capacity with the ground truth. Results show that we can achieve state-of-the-art results by locating more than half of the labeled plants with the same precision ratio and by only using orthophotos from multispectral imagery. Moreover, we demonstrate that additional data sources related to water basins and population are valuable resources that the model can exploit to infer WWTP capacity. During the process, we also demonstrate the benefit of using negative instances to train our model and the impact of using an appropriate loss function such as Dice's loss

    Informal settlement segmentation using VHR RGB and height information from UAV imagery: a case study of Nepal

    Get PDF
    Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial TechnologiesInformal settlement in developing countries are complex. They are contextually and radiometrically very similar to formal settlement. Resolution offered by Remote sensing is not sufficient to capture high variations and feature size in informal settlements in these situations. UAV imageries offers solution with higher resolution. Incorporating UAV image and normalized DSM obtained from UAV provides an opportunity of including information on 3D space. This can be a crucial factor for informal settlement extraction in countries like Nepal. While formal and informal settlements have similar texture, they differ significantly in height. In this regard, we propose segmentation of informal settlement of Nepal using UAV and normalized DSM, against traditional approach of orthophoto only or orthophoto and DSM. Absolute height, normalized DSM(nDSM) and vegetation index from visual band added to 8 bit RGB channels are used to locate informal settlements. Segmentation including nDSM resulted in 6 % increment in Intersection over Union for informal settlements. IoU of 85% for informal settlement is obtained using nDSM trained end to end on Resnet18 based Unet. Use of threshold value had same effect as using absolute height, meaning use of threshold does not alter result from using absolute nDSM. Integration of height as additional band showed better performance over model that trained height separately. Interestingly, benefits of vegetation index is limited to settlements with small huts partly covered with vegetation, which has no or negative effect elsewhere

    FAST ROTATED BOUNDING BOX ANNOTATIONS FOR OBJECT DETECTION

    Get PDF
    Traditionally, object detection models use a large amount of annotated data and axis-aligned bounding boxes (AABBs) are often chosen as the image annotation technique for both training and predictions. The purpose of annotating the objects in the images is to indicate the regions of interest with the corresponding labels. Accurate object annotations help the computer vision models to understand the distinct patterns of the image features to recognize and localize different classes of objects. However, AABBs are often a poor fit for elongated object instances. It’s also challenging to localize objects with AABBs in densely packed aerial images because of overlapping adjacent bounding boxes. Alternatively, using rectangular annotations that can be oriented diagonally, also known as rotated bounding boxes (RBB), can provide a much tighter fit for elongated objects and reduce the potential bounding box overlap between adjacent objects. However, RBBs are much more time-consuming and tedious to annotate than AABBs for large datasets. In this work, we propose a novel annotation tool named as FastRoLabelImg (Fast Rotated LabelImg) for producing high-quality RBB annotations with low time and effort. The tool generates accurate RBB proposals for objects of interest as the annotator makes progress through the dataset. It can also adapt available AABBs to generate RBB proposals. Furthermore, a multipoint box drawing system is provided to reduce manual RBB annotation time compared to the existing methods. Across three diverse datasets, we show that the proposal generation methods can achieve a maximum of 88.9% manual workload reduction. We also show that our proposed manual annotation method is twice as fast as the existing system with the same accuracy by conducting a participant study. Lastly, we publish the RBB annotations for two public datasets in order to motivate future research that will contribute in developing more competent object detection algorithms capable of RBB predictions
    • …
    corecore