84 research outputs found
Deep Learning for Aerial Scene Understanding in High Resolution Remote Sensing Imagery from the Lab to the Wild
Diese Arbeit präsentiert die Anwendung von Deep Learning beim Verständnis von Luftszenen, z. B. Luftszenenerkennung, Multi-Label-Objektklassifizierung und semantische Segmentierung. Abgesehen vom Training tiefer Netzwerke unter Laborbedingungen bietet diese Arbeit auch Lernstrategien für praktische Szenarien, z. B. werden Daten ohne Einschränkungen gesammelt oder Annotationen sind knapp
Deciphering a novel image cipher based on mixed transformed Logistic maps
Since John von Neumann suggested utilizing Logistic map as a random number
generator in 1947, a great number of encryption schemes based on Logistic map
and/or its variants have been proposed. This paper re-evaluates the security of
an image cipher based on transformed logistic maps and proves that the image
cipher can be deciphered efficiently under two different conditions: 1) two
pairs of known plain-images and the corresponding cipher-images with
computational complexity of ; 2) two pairs of chosen plain-images
and the corresponding cipher-images with computational complexity of ,
where is the number of pixels in the plain-image. In contrast, the required
condition in the previous deciphering method is eighty-seven pairs of chosen
plain-images and the corresponding cipher-images with computational complexity
of . In addition, three other security flaws existing in most
Logistic-map-based ciphers are also reported.Comment: 10 pages, 2 figure
Recurrently Exploring Class-wise Attention in A Hybrid Convolutional and Bidirectional LSTM Network for Multi-label Aerial Image Classification
Aerial image classification is of great significance in remote sensing
community, and many researches have been conducted over the past few years.
Among these studies, most of them focus on categorizing an image into one
semantic label, while in the real world, an aerial image is often associated
with multiple labels, e.g., multiple object-level labels in our case. Besides,
a comprehensive picture of present objects in a given high resolution aerial
image can provide more in-depth understanding of the studied region. For these
reasons, aerial image multi-label classification has been attracting increasing
attention. However, one common limitation shared by existing methods in the
community is that the co-occurrence relationship of various classes, so called
class dependency, is underexplored and leads to an inconsiderate decision. In
this paper, we propose a novel end-to-end network, namely class-wise
attention-based convolutional and bidirectional LSTM network (CA-Conv-BiLSTM),
for this task. The proposed network consists of three indispensable components:
1) a feature extraction module, 2) a class attention learning layer, and 3) a
bidirectional LSTM-based sub-network. Particularly, the feature extraction
module is designed for extracting fine-grained semantic feature maps, while the
class attention learning layer aims at capturing discriminative class-specific
features. As the most important part, the bidirectional LSTM-based sub-network
models the underlying class dependency in both directions and produce
structured multiple object labels. Experimental results on UCM multi-label
dataset and DFC15 multi-label dataset validate the effectiveness of our model
quantitatively and qualitatively
Relation Network for Multi-label Aerial Image Classification
Multi-label classification plays a momentous role in perceiving intricate
contents of an aerial image and triggers several related studies over the last
years. However, most of them deploy few efforts in exploiting label relations,
while such dependencies are crucial for making accurate predictions. Although
an LSTM layer can be introduced to modeling such label dependencies in a chain
propagation manner, the efficiency might be questioned when certain labels are
improperly inferred. To address this, we propose a novel aerial image
multi-label classification network, attention-aware label relational reasoning
network. Particularly, our network consists of three elemental modules: 1) a
label-wise feature parcel learning module, 2) an attentional region extraction
module, and 3) a label relational inference module. To be more specific, the
label-wise feature parcel learning module is designed for extracting high-level
label-specific features. The attentional region extraction module aims at
localizing discriminative regions in these features and yielding attentional
label-specific features. The label relational inference module finally predicts
label existences using label relations reasoned from outputs of the previous
module. The proposed network is characterized by its capacities of extracting
discriminative label-wise features in a proposal-free way and reasoning about
label relations naturally and interpretably. In our experiments, we evaluate
the proposed model on the UCM multi-label dataset and a newly produced dataset,
AID multi-label dataset. Quantitative and qualitative results on these two
datasets demonstrate the effectiveness of our model. To facilitate progress in
the multi-label aerial image classification, the AID multi-label dataset will
be made publicly available
Ambient Sound Helps: Audiovisual Crowd Counting in Extreme Conditions
Visual crowd counting has been recently studied as a way to enable people
counting in crowd scenes from images. Albeit successful, vision-based crowd
counting approaches could fail to capture informative features in extreme
conditions, e.g., imaging at night and occlusion. In this work, we introduce a
novel task of audiovisual crowd counting, in which visual and auditory
information are integrated for counting purposes. We collect a large-scale
benchmark, named auDiovISual Crowd cOunting (DISCO) dataset, consisting of
1,935 images and the corresponding audio clips, and 170,270 annotated
instances. In order to fuse the two modalities, we make use of a linear
feature-wise fusion module that carries out an affine transformation on visual
and auditory features. Finally, we conduct extensive experiments using the
proposed dataset and approach. Experimental results show that introducing
auditory information can benefit crowd counting under different illumination,
noise, and occlusion conditions. The dataset and code will be released. Code
and data have been made availabl
RRSIS: Referring Remote Sensing Image Segmentation
Localizing desired objects from remote sensing images is of great use in
practical applications. Referring image segmentation, which aims at segmenting
out the objects to which a given expression refers, has been extensively
studied in natural images. However, almost no research attention is given to
this task of remote sensing imagery. Considering its potential for real-world
applications, in this paper, we introduce referring remote sensing image
segmentation (RRSIS) to fill in this gap and make some insightful explorations.
Specifically, we create a new dataset, called RefSegRS, for this task, enabling
us to evaluate different methods. Afterward, we benchmark referring image
segmentation methods of natural images on the RefSegRS dataset and find that
these models show limited efficacy in detecting small and scattered objects. To
alleviate this issue, we propose a language-guided cross-scale enhancement
(LGCE) module that utilizes linguistic features to adaptively enhance
multi-scale visual features by integrating both deep and shallow features. The
proposed dataset, benchmarking results, and the designed LGCE module provide
insights into the design of a better RRSIS model. We will make our dataset and
code publicly available
Semantic Segmentation of Remote Sensing Images with Sparse Annotations
Training Convolutional Neural Networks (CNNs) for very high resolution images
requires a large quantity of high-quality pixel-level annotations, which is
extremely labor- and time-consuming to produce. Moreover, professional photo
interpreters might have to be involved for guaranteeing the correctness of
annotations. To alleviate such a burden, we propose a framework for semantic
segmentation of aerial images based on incomplete annotations, where annotators
are asked to label a few pixels with easy-to-draw scribbles. To exploit these
sparse scribbled annotations, we propose the FEature and Spatial relaTional
regulArization (FESTA) method to complement the supervised task with an
unsupervised learning signal that accounts for neighbourhood structures both in
spatial and feature terms
Unconstrained Aerial Scene Recognition with Deep Neural Networks and a New Dataset
Aerial scene recognition is a fundamental research problem in interpreting high-resolution aerial imagery. Over the past few years, most studies focus on classifying an image into one scene category, while in real-world scenarios, it is more often that a single image contains multiple scenes. Therefore, in this paper, we investigate a more practical yet underexplored task---multi-scene recognition in single images. To this end, we create a large-scale dataset, called MultiScene dataset, composed of 100,000 unconstrained images each with multiple labels from 36 different scenes. Among these images, 14,000 of them are manually interpreted and assigned ground-truth labels, while the remaining images are provided with crowdsourced labels, which are generated from low-cost but noisy OpenStreetMap (OSM) data. By doing so, our dataset allows two branches of studies: 1) developing novel CNNs for multi-scene recognition and 2) learning with noisy labels. We experiment with extensive baseline models on our dataset to offer a benchmark for multi-scene recognition in single images. Aiming to expedite further researches, we will make our dataset and pre-trained models availabl
- …