710 research outputs found
Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning
Large, pretrained models are commonly finetuned with imagery that is heavily
augmented to mimic different conditions and scales, with the resulting models
used for various tasks with imagery from a range of spatial scales. Such models
overlook scale-specific information in the data for scale-dependent domains,
such as remote sensing. In this paper, we present Scale-MAE, a pretraining
method that explicitly learns relationships between data at different, known
scales throughout the pretraining process. Scale-MAE pretrains a network by
masking an input image at a known input scale, where the area of the Earth
covered by the image determines the scale of the ViT positional encoding, not
the image resolution. Scale-MAE encodes the masked image with a standard ViT
backbone, and then decodes the masked image through a bandpass filter to
reconstruct low/high frequency images at lower/higher scales. We find that
tasking the network with reconstructing both low/high frequency images leads to
robust multiscale representations for remote sensing imagery. Scale-MAE
achieves an average of a non-parametric kNN classification
improvement across eight remote sensing datasets compared to current
state-of-the-art and obtains a mIoU to mIoU improvement on the
SpaceNet building segmentation transfer task for a range of evaluation scales
Semi-supervised Road Updating Network (SRUNet): A Deep Learning Method for Road Updating from Remote Sensing Imagery and Historical Vector Maps
A road is the skeleton of a city and is a fundamental and important
geographical component. Currently, many countries have built geo-information
databases and gathered large amounts of geographic data. However, with the
extensive construction of infrastructure and rapid expansion of cities,
automatic updating of road data is imperative to maintain the high quality of
current basic geographic information. However, obtaining bi-phase images for
the same area is difficult, and complex post-processing methods are required to
update the existing databases.To solve these problems, we proposed a road
detection method based on semi-supervised learning (SRUNet) specifically for
road-updating applications; in this approach, historical road information was
fused with the latest images to directly obtain the latest state of the
road.Considering that the texture of a road is complex, a multi-branch network,
named the Map Encoding Branch (MEB) was proposed for representation learning,
where the Boundary Enhancement Module (BEM) was used to improve the accuracy of
boundary prediction, and the Residual Refinement Module (RRM) was used to
optimize the prediction results. Further, to fully utilize the limited amount
of label information and to enhance the prediction accuracy on unlabeled
images, we utilized the mean teacher framework as the basic semi-supervised
learning framework and introduced Regional Contrast (ReCo) in our work to
improve the model capacity for distinguishing between the characteristics of
roads and background elements.We applied our method to two datasets. Our model
can effectively improve the performance of a model with fewer labels. Overall,
the proposed SRUNet can provide stable, up-to-date, and reliable prediction
results for a wide range of road renewal tasks.Comment: 22 pages, 8 figure
Automated High-resolution Earth Observation Image Interpretation: Outcome of the 2020 Gaofen Challenge
In this article, we introduce the 2020 Gaofen Challenge and relevant scientific outcomes. The 2020 Gaofen Challenge is an international competition, which is organized by the China High-Resolution Earth Observation Conference Committee and the Aerospace Information Research Institute, Chinese Academy of Sciences and technically cosponsored by the IEEE Geoscience and Remote Sensing Society and the International Society for Photogrammetry and Remote Sensing. It aims at promoting the academic development of automated high-resolution earth observation image interpretation. Six independent tracks have been organized in this challenge, which cover the challenging problems in the field of object detection and semantic segmentation. With the development of convolutional neural networks, deep-learning-based methods have achieved good performance on image interpretation. In this article, we report the details and the best-performing methods presented so far in the scope of this challenge
Road Segmentation for Remote Sensing Images using Adversarial Spatial Pyramid Networks
Road extraction in remote sensing images is of great importance for a wide
range of applications. Because of the complex background, and high density,
most of the existing methods fail to accurately extract a road network that
appears correct and complete. Moreover, they suffer from either insufficient
training data or high costs of manual annotation. To address these problems, we
introduce a new model to apply structured domain adaption for synthetic image
generation and road segmentation. We incorporate a feature pyramid network into
generative adversarial networks to minimize the difference between the source
and target domains. A generator is learned to produce quality synthetic images,
and the discriminator attempts to distinguish them. We also propose a feature
pyramid network that improves the performance of the proposed model by
extracting effective features from all the layers of the network for describing
different scales objects. Indeed, a novel scale-wise architecture is introduced
to learn from the multi-level feature maps and improve the semantics of the
features. For optimization, the model is trained by a joint reconstruction loss
function, which minimizes the difference between the fake images and the real
ones. A wide range of experiments on three datasets prove the superior
performance of the proposed approach in terms of accuracy and efficiency. In
particular, our model achieves state-of-the-art 78.86 IOU on the Massachusetts
dataset with 14.89M parameters and 86.78B FLOPs, with 4x fewer FLOPs but higher
accuracy (+3.47% IOU) than the top performer among state-of-the-art approaches
used in the evaluation
Fine-Grained Extraction of Road Networks via Joint Learning of Connectivity and Segmentation
Road network extraction from satellite images is widely applicated in
intelligent traffic management and autonomous driving fields. The
high-resolution remote sensing images contain complex road areas and distracted
background, which make it a challenge for road extraction. In this study, we
present a stacked multitask network for end-to-end segmenting roads while
preserving connectivity correctness. In the network, a global-aware module is
introduced to enhance pixel-level road feature representation and eliminate
background distraction from overhead images; a road-direction-related
connectivity task is added to ensure that the network preserves the graph-level
relationships of the road segments. We also develop a stacked multihead
structure to jointly learn and effectively utilize the mutual information
between connectivity learning and segmentation learning. We evaluate the
performance of the proposed network on three public remote sensing datasets.
The experimental results demonstrate that the network outperforms the
state-of-the-art methods in terms of road segmentation accuracy and
connectivity maintenance
An End-to-End Real-Time Lightweight Network for the Joint Segmentation of Optic Disc and Optic Cup on Fundus Images
Glaucoma is the second-most-blinding eye disease in the world and accurate segmentation of the optic disc (OD) and optic cup (OC) is essential for the diagnosis of glaucoma. To solve the problems of poor real-time performance, high algorithm complexity, and large memory consumption of fundus segmentation algorithms, a lightweight segmentation algorithm, GlauNet, based on convolutional neural networks, is proposed. The algorithm designs an efficient feature-extraction network and proposes a multiscale boundary fusion (MBF) module, which greatly improves the segmentation efficiency of the algorithm while ensuring segmentation accuracy. Experiments show that the algorithm achieves Dice scores of 0.9701/0.8959, 0.9650/0.8621, and 0.9594/0.8795 on three publicly available datasets—Drishti-GS, RIM-ONE-r3, and REFUGE-train—for both the optic disc and the optic cup. The number of model parameters is only 0.8 M, and it only takes 13 ms to infer an 800 × 800 fundus image on a GTX 3070 GPU
On the Application of Data Clustering Algorithm used in Information Retrieval for Satellite Imagery Segmentation
This study proposes an automated technique for segmenting satellite imagery using unsupervised learning. Autoencoders, a type of neural network, are employed for dimensionality reduction and feature extraction. The study evaluates different segmentation architectures and encoders and identifies the best performing combination as the DeepLabv3+ architecture with a ResNet-152 encoder. This approach achieves high performance scores across multiple metrics and can be beneficial in various fields, including agriculture, land use monitoring, and disaster response
- …