1,305 research outputs found
Domain Adaptive Transfer Attack (DATA)-based Segmentation Networks for Building Extraction from Aerial Images
Semantic segmentation models based on convolutional neural networks (CNNs)
have gained much attention in relation to remote sensing and have achieved
remarkable performance for the extraction of buildings from high-resolution
aerial images. However, the issue of limited generalization for unseen images
remains. When there is a domain gap between the training and test datasets,
CNN-based segmentation models trained by a training dataset fail to segment
buildings for the test dataset. In this paper, we propose segmentation networks
based on a domain adaptive transfer attack (DATA) scheme for building
extraction from aerial images. The proposed system combines the domain transfer
and adversarial attack concepts. Based on the DATA scheme, the distribution of
the input images can be shifted to that of the target images while turning
images into adversarial examples against a target network. Defending
adversarial examples adapted to the target domain can overcome the performance
degradation due to the domain gap and increase the robustness of the
segmentation model. Cross-dataset experiments and the ablation study are
conducted for the three different datasets: the Inria aerial image labeling
dataset, the Massachusetts building dataset, and the WHU East Asia dataset.
Compared to the performance of the segmentation network without the DATA
scheme, the proposed method shows improvements in the overall IoU. Moreover, it
is verified that the proposed method outperforms even when compared to feature
adaptation (FA) and output space adaptation (OSA).Comment: 11pages, 12 figure
Towards Robust Curve Text Detection with Conditional Spatial Expansion
It is challenging to detect curve texts due to their irregular shapes and
varying sizes. In this paper, we first investigate the deficiency of the
existing curve detection methods and then propose a novel Conditional Spatial
Expansion (CSE) mechanism to improve the performance of curve text detection.
Instead of regarding the curve text detection as a polygon regression or a
segmentation problem, we treat it as a region expansion process. Our CSE starts
with a seed arbitrarily initialized within a text region and progressively
merges neighborhood regions based on the extracted local features by a CNN and
contextual information of merged regions. The CSE is highly parameterized and
can be seamlessly integrated into existing object detection frameworks.
Enhanced by the data-dependent CSE mechanism, our curve text detection system
provides robust instance-level text region extraction with minimal
post-processing. The analysis experiment shows that our CSE can handle texts
with various shapes, sizes, and orientations, and can effectively suppress the
false-positives coming from text-like textures or unexpected texts included in
the same RoI. Compared with the existing curve text detection algorithms, our
method is more robust and enjoys a simpler processing flow. It also creates a
new state-of-art performance on curve text benchmarks with F-score of up to
78.4.Comment: This paper has been accepted by IEEE International Conference on
Computer Vision and Pattern Recognition (CVPR 2019
Dense semantic labeling of sub-decimeter resolution images with convolutional neural networks
Semantic labeling (or pixel-level land-cover classification) in ultra-high
resolution imagery (< 10cm) requires statistical models able to learn high
level concepts from spatial data, with large appearance variations.
Convolutional Neural Networks (CNNs) achieve this goal by learning
discriminatively a hierarchy of representations of increasing abstraction.
In this paper we present a CNN-based system relying on an
downsample-then-upsample architecture. Specifically, it first learns a rough
spatial map of high-level representations by means of convolutions and then
learns to upsample them back to the original resolution by deconvolutions. By
doing so, the CNN learns to densely label every pixel at the original
resolution of the image. This results in many advantages, including i)
state-of-the-art numerical accuracy, ii) improved geometric accuracy of
predictions and iii) high efficiency at inference time.
We test the proposed system on the Vaihingen and Potsdam sub-decimeter
resolution datasets, involving semantic labeling of aerial images of 9cm and
5cm resolution, respectively. These datasets are composed by many large and
fully annotated tiles allowing an unbiased evaluation of models making use of
spatial information. We do so by comparing two standard CNN architectures to
the proposed one: standard patch classification, prediction of local label
patches by employing only convolutions and full patch labeling by employing
deconvolutions. All the systems compare favorably or outperform a
state-of-the-art baseline relying on superpixels and powerful appearance
descriptors. The proposed full patch labeling CNN outperforms these models by a
large margin, also showing a very appealing inference time.Comment: Accepted in IEEE Transactions on Geoscience and Remote Sensing, 201
- …