165 research outputs found

    A Dimensional Structure based Knowledge Distillation Method for Cross-Modal Learning

    Full text link
    Due to limitations in data quality, some essential visual tasks are difficult to perform independently. Introducing previously unavailable information to transfer informative dark knowledge has been a common way to solve such hard tasks. However, research on why transferred knowledge works has not been extensively explored. To address this issue, in this paper, we discover the correlation between feature discriminability and dimensional structure (DS) by analyzing and observing features extracted from simple and hard tasks. On this basis, we express DS using deep channel-wise correlation and intermediate spatial distribution, and propose a novel cross-modal knowledge distillation (CMKD) method for better supervised cross-modal learning (CML) performance. The proposed method enforces output features to be channel-wise independent and intermediate ones to be uniformly distributed, thereby learning semantically irrelevant features from the hard task to boost its accuracy. This is especially useful in specific applications where the performance gap between dual modalities is relatively large. Furthermore, we collect a real-world CML dataset to promote community development. The dataset contains more than 10,000 paired optical and radar images and is continuously being updated. Experimental results on real-world and benchmark datasets validate the effectiveness of the proposed method

    Visible-Infrared Person Re-Identification Using Privileged Intermediate Information

    Full text link
    Visible-infrared person re-identification (ReID) aims to recognize a same person of interest across a network of RGB and IR cameras. Some deep learning (DL) models have directly incorporated both modalities to discriminate persons in a joint representation space. However, this cross-modal ReID problem remains challenging due to the large domain shift in data distributions between RGB and IR modalities. % This paper introduces a novel approach for a creating intermediate virtual domain that acts as bridges between the two main domains (i.e., RGB and IR modalities) during training. This intermediate domain is considered as privileged information (PI) that is unavailable at test time, and allows formulating this cross-modal matching task as a problem in learning under privileged information (LUPI). We devised a new method to generate images between visible and infrared domains that provide additional information to train a deep ReID model through an intermediate domain adaptation. In particular, by employing color-free and multi-step triplet loss objectives during training, our method provides common feature representation spaces that are robust to large visible-infrared domain shifts. % Experimental results on challenging visible-infrared ReID datasets indicate that our proposed approach consistently improves matching accuracy, without any computational overhead at test time. The code is available at: \href{https://github.com/alehdaghi/Cross-Modal-Re-ID-via-LUPI}{https://github.com/alehdaghi/Cross-Modal-Re-ID-via-LUPI

    Exploring Hyperspectral Anomaly Detection with Human Vision: A Small Target Aware Detector

    Full text link
    Hyperspectral anomaly detection (HAD) aims to localize pixel points whose spectral features differ from the background. HAD is essential in scenarios of unknown or camouflaged target features, such as water quality monitoring, crop growth monitoring and camouflaged target detection, where prior information of targets is difficult to obtain. Existing HAD methods aim to objectively detect and distinguish background and anomalous spectra, which can be achieved almost effortlessly by human perception. However, the underlying processes of human visual perception are thought to be quite complex. In this paper, we analyze hyperspectral image (HSI) features under human visual perception, and transfer the solution process of HAD to the more robust feature space for the first time. Specifically, we propose a small target aware detector (STAD), which introduces saliency maps to capture HSI features closer to human visual perception. STAD not only extracts more anomalous representations, but also reduces the impact of low-confidence regions through a proposed small target filter (STF). Furthermore, considering the possibility of HAD algorithms being applied to edge devices, we propose a full connected network to convolutional network knowledge distillation strategy. It can learn the spectral and spatial features of the HSI while lightening the network. We train the network on the HAD100 training set and validate the proposed method on the HAD100 test set. Our method provides a new solution space for HAD that is closer to human visual perception with high confidence. Sufficient experiments on real HSI with multiple method comparisons demonstrate the excellent performance and unique potential of the proposed method. The code is available at https://github.com/majitao-xd/STAD-HAD

    Self-supervised Learning in Remote Sensing: A Review

    Get PDF
    In deep learning research, self-supervised learning (SSL) has received great attention triggering interest within both the computer vision and remote sensing communities. While there has been a big success in computer vision, most of the potential of SSL in the domain of earth observation remains locked. In this paper, we provide an introduction to, and a review of the concepts and latest developments in SSL for computer vision in the context of remote sensing. Further, we provide a preliminary benchmark of modern SSL algorithms on popular remote sensing datasets, verifying the potential of SSL in remote sensing and providing an extended study on data augmentations. Finally, we identify a list of promising directions of future research in SSL for earth observation (SSL4EO) to pave the way for fruitful interaction of both domains.Comment: Accepted by IEEE Geoscience and Remote Sensing Magazine. 32 pages, 22 content page

    Cross Modal Distillation for Flood Extent Mapping

    Full text link
    The increasing intensity and frequency of floods is one of the many consequences of our changing climate. In this work, we explore ML techniques that improve the flood detection module of an operational early flood warning system. Our method exploits an unlabelled dataset of paired multi-spectral and Synthetic Aperture Radar (SAR) imagery to reduce the labeling requirements of a purely supervised learning method. Prior works have used unlabelled data by creating weak labels out of them. However, from our experiments we noticed that such a model still ends up learning the label mistakes in those weak labels. Motivated by knowledge distillation and semi supervised learning, we explore the use of a teacher to train a student with the help of a small hand labelled dataset and a large unlabelled dataset. Unlike the conventional self distillation setup, we propose a cross modal distillation framework that transfers supervision from a teacher trained on richer modality (multi-spectral images) to a student model trained on SAR imagery. The trained models are then tested on the Sen1Floods11 dataset. Our model outperforms the Sen1Floods11 baseline model trained on the weak labeled SAR imagery by an absolute margin of 6.53% Intersection-over-Union (IoU) on the test split

    Multi-Domain Adaptation for Image Classification, Depth Estimation, and Semantic Segmentation

    Get PDF
    The appearance of scenes may change for many reasons, including the viewpoint, the time of day, the weather, and the seasons. Traditionally, deep neural networks are trained and evaluated using images from the same scene and domain to avoid the domain gap. Recent advances in domain adaptation have led to a new type of method that bridges such domain gaps and learns from multiple domains. This dissertation proposes methods for multi-domain adaptation for various computer vision tasks, including image classification, depth estimation, and semantic segmentation. The first work focuses on semi-supervised domain adaptation. I address this semi-supervised setting and propose to use dynamic feature alignment to address both inter- and intra-domain discrepancy. The second work addresses the task of monocular depth estimation in the multi-domain setting. I propose to address this task with a unified approach that includes adversarial knowledge distillation and uncertainty-guided self-supervised reconstruction. The third work considers the problem of semantic segmentation for aerial imagery with diverse environments and viewing geometries. I present CrossSeg: a novel framework that learns a semantic segmentation network that can generalize well in a cross-scene setting with only a few labeled samples. I believe this line of work can be applicable to many domain adaptation scenarios and aerial applications
    • …
    corecore