436 research outputs found
Deep learning in remote sensing: a review
Standing at the paradigm shift towards data-intensive science, machine
learning techniques are becoming increasingly important. In particular, as a
major breakthrough in the field, deep learning has proven as an extremely
powerful tool in many fields. Shall we embrace deep learning as the key to all?
Or, should we resist a 'black-box' solution? There are controversial opinions
in the remote sensing community. In this article, we analyze the challenges of
using deep learning for remote sensing data analysis, review the recent
advances, and provide resources to make deep learning in remote sensing
ridiculously simple to start with. More importantly, we advocate remote sensing
scientists to bring their expertise into deep learning, and use it as an
implicit general model to tackle unprecedented large-scale influential
challenges, such as climate change and urbanization.Comment: Accepted for publication IEEE Geoscience and Remote Sensing Magazin
Integrating efficientnet into an hafnet structure for building mapping in high-resolution optical earth observation data
Automated extraction of buildings from Earth observation (EO) data is important for various applications, including updating of maps, risk assessment, urban planning, and policy-making. Combining data from different sensors, such as high-resolution multispectral images (HRI) and light detection and ranging (LiDAR) data, has shown great potential in building extraction. Deep learning (DL) is increasingly used in multi-modal data fusion and urban object extraction. However, DL-based multi-modal fusion networks may under-perform due to insufficient learning of “joint features” from multiple sources and oversimplified approaches to fusing multi-modal features. Recently, a hybrid attention-aware fusion network (HAFNet) has been proposed for building extraction from a dataset, including co-located Very-High-Resolution (VHR) optical images and light detection and ranging (LiDAR) joint data. The system reported good performances thanks to the adaptivity of the attention mechanism to the features of the information content of the three streams but suffered from model over-parametrization, which inevitably leads to long training times and heavy computational load. In this paper, the authors propose a restructuring of the scheme, which involved replacing VGG-16-like encoders with the recently proposed EfficientNet, whose advantages counteract exactly the issues found with the HAFNet scheme. The novel configuration was tested on multiple benchmark datasets, reporting great improvements in terms of processing times, and also in terms of accuracy. The new scheme, called HAFNetE (HAFNet with EfficientNet integration), appears indeed capable of achieving good results with less parameters, translating into better computational efficiency. Based on these findings, we can conclude that, given the current advancements in single-thread schemes, the classical multi-thread HAFNet scheme could be effectively transformed by the HAFNetE scheme by replacing VGG-16 with EfficientNet blocks on each single thread. The remarkable reduction achieved in computational requirements moves the system one step closer to on-board implementation in a possible, future “urban mapping” satellite constellation
Semantic Labeling of High Resolution Images Using EfficientUNets and Transformers
Semantic segmentation necessitates approaches that learn high-level
characteristics while dealing with enormous amounts of data. Convolutional
neural networks (CNNs) can learn unique and adaptive features to achieve this
aim. However, due to the large size and high spatial resolution of remote
sensing images, these networks cannot analyze an entire scene efficiently.
Recently, deep transformers have proven their capability to record global
interactions between different objects in the image. In this paper, we propose
a new segmentation model that combines convolutional neural networks with
transformers, and show that this mixture of local and global feature extraction
techniques provides significant advantages in remote sensing segmentation. In
addition, the proposed model includes two fusion layers that are designed to
represent multi-modal inputs and output of the network efficiently. The input
fusion layer extracts feature maps summarizing the relationship between image
content and elevation maps (DSM). The output fusion layer uses a novel
multi-task segmentation strategy where class labels are identified using
class-specific feature extraction layers and loss functions. Finally, a
fast-marching method is used to convert all unidentified class labels to their
closest known neighbors. Our results demonstrate that the proposed methodology
improves segmentation accuracy compared to state-of-the-art techniques
- …