165 research outputs found
A Dimensional Structure based Knowledge Distillation Method for Cross-Modal Learning
Due to limitations in data quality, some essential visual tasks are difficult
to perform independently. Introducing previously unavailable information to
transfer informative dark knowledge has been a common way to solve such hard
tasks. However, research on why transferred knowledge works has not been
extensively explored. To address this issue, in this paper, we discover the
correlation between feature discriminability and dimensional structure (DS) by
analyzing and observing features extracted from simple and hard tasks. On this
basis, we express DS using deep channel-wise correlation and intermediate
spatial distribution, and propose a novel cross-modal knowledge distillation
(CMKD) method for better supervised cross-modal learning (CML) performance. The
proposed method enforces output features to be channel-wise independent and
intermediate ones to be uniformly distributed, thereby learning semantically
irrelevant features from the hard task to boost its accuracy. This is
especially useful in specific applications where the performance gap between
dual modalities is relatively large. Furthermore, we collect a real-world CML
dataset to promote community development. The dataset contains more than 10,000
paired optical and radar images and is continuously being updated. Experimental
results on real-world and benchmark datasets validate the effectiveness of the
proposed method
Visible-Infrared Person Re-Identification Using Privileged Intermediate Information
Visible-infrared person re-identification (ReID) aims to recognize a same
person of interest across a network of RGB and IR cameras. Some deep learning
(DL) models have directly incorporated both modalities to discriminate persons
in a joint representation space. However, this cross-modal ReID problem remains
challenging due to the large domain shift in data distributions between RGB and
IR modalities. % This paper introduces a novel approach for a creating
intermediate virtual domain that acts as bridges between the two main domains
(i.e., RGB and IR modalities) during training. This intermediate domain is
considered as privileged information (PI) that is unavailable at test time, and
allows formulating this cross-modal matching task as a problem in learning
under privileged information (LUPI). We devised a new method to generate images
between visible and infrared domains that provide additional information to
train a deep ReID model through an intermediate domain adaptation. In
particular, by employing color-free and multi-step triplet loss objectives
during training, our method provides common feature representation spaces that
are robust to large visible-infrared domain shifts. % Experimental results on
challenging visible-infrared ReID datasets indicate that our proposed approach
consistently improves matching accuracy, without any computational overhead at
test time. The code is available at:
\href{https://github.com/alehdaghi/Cross-Modal-Re-ID-via-LUPI}{https://github.com/alehdaghi/Cross-Modal-Re-ID-via-LUPI
Exploring Hyperspectral Anomaly Detection with Human Vision: A Small Target Aware Detector
Hyperspectral anomaly detection (HAD) aims to localize pixel points whose
spectral features differ from the background. HAD is essential in scenarios of
unknown or camouflaged target features, such as water quality monitoring, crop
growth monitoring and camouflaged target detection, where prior information of
targets is difficult to obtain. Existing HAD methods aim to objectively detect
and distinguish background and anomalous spectra, which can be achieved almost
effortlessly by human perception. However, the underlying processes of human
visual perception are thought to be quite complex. In this paper, we analyze
hyperspectral image (HSI) features under human visual perception, and transfer
the solution process of HAD to the more robust feature space for the first
time. Specifically, we propose a small target aware detector (STAD), which
introduces saliency maps to capture HSI features closer to human visual
perception. STAD not only extracts more anomalous representations, but also
reduces the impact of low-confidence regions through a proposed small target
filter (STF). Furthermore, considering the possibility of HAD algorithms being
applied to edge devices, we propose a full connected network to convolutional
network knowledge distillation strategy. It can learn the spectral and spatial
features of the HSI while lightening the network. We train the network on the
HAD100 training set and validate the proposed method on the HAD100 test set.
Our method provides a new solution space for HAD that is closer to human visual
perception with high confidence. Sufficient experiments on real HSI with
multiple method comparisons demonstrate the excellent performance and unique
potential of the proposed method. The code is available at
https://github.com/majitao-xd/STAD-HAD
Self-supervised Learning in Remote Sensing: A Review
In deep learning research, self-supervised learning (SSL) has received great
attention triggering interest within both the computer vision and remote
sensing communities. While there has been a big success in computer vision,
most of the potential of SSL in the domain of earth observation remains locked.
In this paper, we provide an introduction to, and a review of the concepts and
latest developments in SSL for computer vision in the context of remote
sensing. Further, we provide a preliminary benchmark of modern SSL algorithms
on popular remote sensing datasets, verifying the potential of SSL in remote
sensing and providing an extended study on data augmentations. Finally, we
identify a list of promising directions of future research in SSL for earth
observation (SSL4EO) to pave the way for fruitful interaction of both domains.Comment: Accepted by IEEE Geoscience and Remote Sensing Magazine. 32 pages, 22
content page
Cross Modal Distillation for Flood Extent Mapping
The increasing intensity and frequency of floods is one of the many
consequences of our changing climate. In this work, we explore ML techniques
that improve the flood detection module of an operational early flood warning
system. Our method exploits an unlabelled dataset of paired multi-spectral and
Synthetic Aperture Radar (SAR) imagery to reduce the labeling requirements of a
purely supervised learning method. Prior works have used unlabelled data by
creating weak labels out of them. However, from our experiments we noticed that
such a model still ends up learning the label mistakes in those weak labels.
Motivated by knowledge distillation and semi supervised learning, we explore
the use of a teacher to train a student with the help of a small hand labelled
dataset and a large unlabelled dataset. Unlike the conventional self
distillation setup, we propose a cross modal distillation framework that
transfers supervision from a teacher trained on richer modality (multi-spectral
images) to a student model trained on SAR imagery. The trained models are then
tested on the Sen1Floods11 dataset. Our model outperforms the Sen1Floods11
baseline model trained on the weak labeled SAR imagery by an absolute margin of
6.53% Intersection-over-Union (IoU) on the test split
Multi-Domain Adaptation for Image Classification, Depth Estimation, and Semantic Segmentation
The appearance of scenes may change for many reasons, including the viewpoint, the time of day, the weather, and the seasons. Traditionally, deep neural networks are trained and evaluated using images from the same scene and domain to avoid the domain gap. Recent advances in domain adaptation have led to a new type of method that bridges such domain gaps and learns from multiple domains.
This dissertation proposes methods for multi-domain adaptation for various computer vision tasks, including image classification, depth estimation, and semantic segmentation. The first work focuses on semi-supervised domain adaptation. I address this semi-supervised setting and propose to use dynamic feature alignment to address both inter- and intra-domain discrepancy. The second work addresses the task of monocular depth estimation in the multi-domain setting. I propose to address this task with a unified approach that includes adversarial knowledge distillation and uncertainty-guided self-supervised reconstruction. The third work considers the problem of semantic segmentation for aerial imagery with diverse environments and viewing geometries. I present CrossSeg: a novel framework that learns a semantic segmentation network that can generalize well in a cross-scene setting with only a few labeled samples. I believe this line of work can be applicable to many domain adaptation scenarios and aerial applications
- …