455 research outputs found

    Building extraction from high-resolution aerial imagery using a generative adversarial network with spatial and channel attention mechanisms.

    Get PDF
    Segmentation of high-resolution remote sensing images is an important challenge with wide practical applications. The increasing spatial resolution provides fine details for image segmentation but also incurs segmentation ambiguities. In this paper, we propose a generative adversarial network with spatial and channel attention mechanisms (GAN-SCA) for the robust segmentation of buildings in remote sensing images. The segmentation network (generator) of the proposed framework is composed of the well-known semantic segmentation architecture (U-Net) and the spatial and channel attention mechanisms (SCA). The adoption of SCA enables the segmentation network to selectively enhance more useful features in specific positions and channels and enables improved results closer to the ground truth. The discriminator is an adversarial network with channel attention mechanisms that can properly discriminate the outputs of the generator and the ground truth maps. The segmentation network and adversarial network are trained in an alternating fashion on the Inria aerial image labeling dataset and Massachusetts buildings dataset. Experimental results show that the proposed GAN-SCA achieves a higher score (the overall accuracy and intersection over the union of Inria aerial image labeling dataset are 96.61% and 77.75%, respectively, and the F1-measure of the Massachusetts buildings dataset is 96.36%) and outperforms several state-of-the-art approaches

    A review of technical factors to consider when designing neural networks for semantic segmentation of Earth Observation imagery

    Full text link
    Semantic segmentation (classification) of Earth Observation imagery is a crucial task in remote sensing. This paper presents a comprehensive review of technical factors to consider when designing neural networks for this purpose. The review focuses on Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Generative Adversarial Networks (GANs), and transformer models, discussing prominent design patterns for these ANN families and their implications for semantic segmentation. Common pre-processing techniques for ensuring optimal data preparation are also covered. These include methods for image normalization and chipping, as well as strategies for addressing data imbalance in training samples, and techniques for overcoming limited data, including augmentation techniques, transfer learning, and domain adaptation. By encompassing both the technical aspects of neural network design and the data-related considerations, this review provides researchers and practitioners with a comprehensive and up-to-date understanding of the factors involved in designing effective neural networks for semantic segmentation of Earth Observation imagery.Comment: 145 pages with 32 figure

    Building Extraction from Very High Resolution Aerial Imagery Using Joint Attention Deep Neural Network

    Get PDF
    Automated methods to extract buildings from very high resolution (VHR) remote sensing data have many applications in a wide range of fields. Many convolutional neural network (CNN) based methods have been proposed and have achieved significant advances in the building extraction task. In order to refine predictions, a lot of recent approaches fuse features from earlier layers of CNNs to introduce abundant spatial information, which is known as skip connection. However, this strategy of reusing earlier features directly without processing could reduce the performance of the network. To address this problem, we propose a novel fully convolutional network (FCN) that adopts attention based re-weighting to extract buildings from aerial imagery. Specifically, we consider the semantic gap between features from different stages and leverage the attention mechanism to bridge the gap prior to the fusion of features. The inferred attention weights along spatial and channel-wise dimensions make the low level feature maps adaptive to high level feature maps in a target-oriented manner. Experimental results on three publicly available aerial imagery datasets show that the proposed model (RFA-UNet) achieves comparable and improved performance compared to other state-of-the-art models for building extraction

    A Review on Deep Learning in UAV Remote Sensing

    Full text link
    Deep Neural Networks (DNNs) learn representation from data with an impressive capability, and brought important breakthroughs for processing images, time-series, natural language, audio, video, and many others. In the remote sensing field, surveys and literature revisions specifically involving DNNs algorithms' applications have been conducted in an attempt to summarize the amount of information produced in its subfields. Recently, Unmanned Aerial Vehicles (UAV) based applications have dominated aerial sensing research. However, a literature revision that combines both "deep learning" and "UAV remote sensing" thematics has not yet been conducted. The motivation for our work was to present a comprehensive review of the fundamentals of Deep Learning (DL) applied in UAV-based imagery. We focused mainly on describing classification and regression techniques used in recent applications with UAV-acquired data. For that, a total of 232 papers published in international scientific journal databases was examined. We gathered the published material and evaluated their characteristics regarding application, sensor, and technique used. We relate how DL presents promising results and has the potential for processing tasks associated with UAV-based image data. Lastly, we project future perspectives, commentating on prominent DL paths to be explored in the UAV remote sensing field. Our revision consists of a friendly-approach to introduce, commentate, and summarize the state-of-the-art in UAV-based image applications with DNNs algorithms in diverse subfields of remote sensing, grouping it in the environmental, urban, and agricultural contexts.Comment: 38 pages, 10 figure

    Object Detection in 20 Years: A Survey

    Full text link
    Object detection, as of one the most fundamental and challenging problems in computer vision, has received great attention in recent years. Its development in the past two decades can be regarded as an epitome of computer vision history. If we think of today's object detection as a technical aesthetics under the power of deep learning, then turning back the clock 20 years we would witness the wisdom of cold weapon era. This paper extensively reviews 400+ papers of object detection in the light of its technical evolution, spanning over a quarter-century's time (from the 1990s to 2019). A number of topics have been covered in this paper, including the milestone detectors in history, detection datasets, metrics, fundamental building blocks of the detection system, speed up techniques, and the recent state of the art detection methods. This paper also reviews some important detection applications, such as pedestrian detection, face detection, text detection, etc, and makes an in-deep analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible publicatio

    SEG-ESRGAN: A multi-task network for super-resolution and semantic segmentation of remote sensing images

    Get PDF
    The production of highly accurate land cover maps is one of the primary challenges in remote sensing, which depends on the spatial resolution of the input images. Sometimes, high-resolution imagery is not available or is too expensive to cover large areas or to perform multitemporal analysis. In this context, we propose a multi-task network to take advantage of the freely available Sentinel-2 imagery to produce a super-resolution image, with a scaling factor of 5, and the corresponding high-resolution land cover map. Our proposal, named SEG-ESRGAN, consists of two branches: the super-resolution branch, that produces Sentinel-2 multispectral images at 2 m resolution, and an encoder–decoder architecture for the semantic segmentation branch, that generates the enhanced land cover map. From the super-resolution branch, several skip connections are retrieved and concatenated with features from the different stages of the encoder part of the segmentation branch, promoting the flow of meaningful information to boost the accuracy in the segmentation task. Our model is trained with a multi-loss approach using a novel dataset to train and test the super-resolution stage, which is developed from Sentinel-2 and WorldView-2 image pairs. In addition, we generated a dataset with ground-truth labels for the segmentation task. To assess the super-resolution improvement, the PSNR, SSIM, ERGAS, and SAM metrics were considered, while to measure the classification performance, we used the IoU, confusion matrix and the F1-score. Experimental results demonstrate that the SEG-ESRGAN model outperforms different full segmentation and dual network models (U-Net, DeepLabV3+, HRNet and Dual_DeepLab), allowing the generation of high-resolution land cover maps in challenging scenarios using Sentinel-2 10 m bands.This work was funded by the Spanish Agencia Estatal de Investigación (AEI) under projects ARTEMISAT-2 (CTM 2016-77733-R), PID2020-117142GB-I00 and PID2020-116907RB-I00 (MCIN/AEI call 10.13039/501100011033). L.S. would like to acknowledge the BECAL (Becas Carlos Antonio López) scholarship for the financial support.Peer ReviewedPostprint (published version

    Extracting structured information from 2D images

    Get PDF
    Convolutional neural networks can handle an impressive array of supervised learning tasks while relying on a single backbone architecture, suggesting that one solution fits all vision problems. But for many tasks, we can directly make use of the problem structure within neural networks to deliver more accurate predictions. In this thesis, we propose novel deep learning components that exploit the structured output space of an increasingly complex set of problems. We start from Optical Character Recognition (OCR) in natural scenes and leverage the constraints imposed by a spatial outline of letters and language requirements. Conventional OCR systems do not work well in natural scenes due to distortions, blur, or letter variability. We introduce a new attention-based model, equipped with extra information about the neuron positions to guide its focus across characters sequentially. It beats the previous state-of-the-art benchmark by a significant margin. We then turn to dense labeling tasks employing encoder-decoder architectures. We start with an experimental study that documents the drastic impact that decoder design can have on task performance. Rather than optimizing one decoder per task separately, we propose new robust layers for the upsampling of high-dimensional encodings. We show that these better suit the structured per pixel output across the board of all tasks. Finally, we turn to the problem of urban scene understanding. There is an elaborate structure in both the input space (multi-view recordings, aerial and street-view scenes) and the output space (multiple fine-grained attributes for holistic building understanding). We design new models that benefit from a relatively simple cuboidal-like geometry of buildings to create a single unified representation from multiple views. To benchmark our model, we build a new multi-view large-scale dataset of buildings images and fine-grained attributes and show systematic improvements when compared to a broad range of strong CNN-based baselines

    Learning representations in the hyperspectral domain in aerial imagery

    Get PDF
    We establish two new datasets with baselines and network architectures for the task of hyperspectral image analysis. The first dataset, AeroRIT, is a moving camera static scene captured from a flight and contains per pixel labeling across five categories for the task of semantic segmentation. The second dataset, RooftopHSI, helps design and interpret learnt features on hyperspectral object detection on scenes captured from an university rooftop. This dataset accounts for static camera, moving scene hyperspectral imagery. We further broaden the scope of our understanding of neural networks with the development of two novel algorithms - S4AL and S4AL+. We develop these frameworks on natural (color) imagery, by combining semi-supervised learning and active learning, and display promising results for learning with limited amount of labeled data, which can be extended to hyperspectral imagery. In this dissertation, we curated two new datasets for hyperspectral image analysis, significantly larger than existing datasets and broader in terms of categories for classification. We then adapt existing neural network architectures to function on the increased channel information, in a smart manner, to leverage all hyperspectral information. We also develop novel active learning algorithms on natural (color) imagery, and discuss the hope for expanding their functionality to hyperspectral imagery

    A Routine and Post-disaster Road Corridor Monitoring Framework for the Increased Resilience of Road Infrastructures

    Get PDF

    A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community

    Full text link
    In recent years, deep learning (DL), a re-branding of neural networks (NNs), has risen to the top in numerous areas, namely computer vision (CV), speech recognition, natural language processing, etc. Whereas remote sensing (RS) possesses a number of unique challenges, primarily related to sensors and applications, inevitably RS draws from many of the same theories as CV; e.g., statistics, fusion, and machine learning, to name a few. This means that the RS community should be aware of, if not at the leading edge of, of advancements like DL. Herein, we provide the most comprehensive survey of state-of-the-art RS DL research. We also review recent new developments in the DL field that can be used in DL for RS. Namely, we focus on theories, tools and challenges for the RS community. Specifically, we focus on unsolved challenges and opportunities as it relates to (i) inadequate data sets, (ii) human-understandable solutions for modelling physical phenomena, (iii) Big Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and learning algorithms for spectral, spatial and temporal data, (vi) transfer learning, (vii) an improved theoretical understanding of DL systems, (viii) high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote Sensin
    • …
    corecore