1,157 research outputs found
Universal Adversarial Defense in Remote Sensing Based on Pre-trained Denoising Diffusion Models
Deep neural networks (DNNs) have achieved tremendous success in many remote
sensing (RS) applications, in which DNNs are vulnerable to adversarial
perturbations. Unfortunately, current adversarial defense approaches in RS
studies usually suffer from performance fluctuation and unnecessary re-training
costs due to the need for prior knowledge of the adversarial perturbations
among RS data. To circumvent these challenges, we propose a universal
adversarial defense approach in RS imagery (UAD-RS) using pre-trained diffusion
models to defend the common DNNs against multiple unknown adversarial attacks.
Specifically, the generative diffusion models are first pre-trained on
different RS datasets to learn generalized representations in various data
domains. After that, a universal adversarial purification framework is
developed using the forward and reverse process of the pre-trained diffusion
models to purify the perturbations from adversarial samples. Furthermore, an
adaptive noise level selection (ANLS) mechanism is built to capture the optimal
noise level of the diffusion model that can achieve the best purification
results closest to the clean samples according to their Frechet Inception
Distance (FID) in deep feature space. As a result, only a single pre-trained
diffusion model is needed for the universal purification of adversarial samples
on each dataset, which significantly alleviates the re-training efforts and
maintains high performance without prior knowledge of the adversarial
perturbations. Experiments on four heterogeneous RS datasets regarding scene
classification and semantic segmentation verify that UAD-RS outperforms
state-of-the-art adversarial purification approaches with a universal defense
against seven commonly existing adversarial perturbations. Codes and the
pre-trained models are available online (https://github.com/EricYu97/UAD-RS).Comment: Added the GitHub link to the abstrac
A dual network for super-resolution and semantic segmentation of sentinel-2 imagery
There is a growing interest in the development of automated data processing workflows that provide reliable, high spatial resolution land cover maps. However, high-resolution remote sensing images are not always affordable. Taking into account the free availability of Sentinel-2 satellite data, in this work we propose a deep learning model to generate high-resolution segmentation maps from low-resolution inputs in a multi-task approach. Our proposal is a dual-network model with two branches: the Single Image Super-Resolution branch, that reconstructs a high-resolution version of the input image, and the Semantic Segmentation Super-Resolution branch, that predicts a high-resolution segmentation map with a scaling factor of 2. We performed several experiments to find the best architecture, training and testing on a subset of the S2GLC 2017 dataset. We based our model on the DeepLabV3+ architecture, enhancing the model and achieving an improvement of 5% on IoU and almost 10% on the recall score. Furthermore, our qualitative results demonstrate the effectiveness and usefulness of the proposed approach.This work has been supported by the Spanish Research Agency (AEI) under project PID2020-117142GB-I00 of the call MCIN/AEI/10.13039/501100011033. L.S. would like to acknowledge the BECAL (Becas Carlos Antonio López) scholarship for the financial support.Peer ReviewedPostprint (published version
Diffusion Models in Vision: A Survey
Denoising diffusion models represent a recent emerging topic in computer
vision, demonstrating remarkable results in the area of generative modeling. A
diffusion model is a deep generative model that is based on two stages, a
forward diffusion stage and a reverse diffusion stage. In the forward diffusion
stage, the input data is gradually perturbed over several steps by adding
Gaussian noise. In the reverse stage, a model is tasked at recovering the
original input data by learning to gradually reverse the diffusion process,
step by step. Diffusion models are widely appreciated for the quality and
diversity of the generated samples, despite their known computational burdens,
i.e. low speeds due to the high number of steps involved during sampling. In
this survey, we provide a comprehensive review of articles on denoising
diffusion models applied in vision, comprising both theoretical and practical
contributions in the field. First, we identify and present three generic
diffusion modeling frameworks, which are based on denoising diffusion
probabilistic models, noise conditioned score networks, and stochastic
differential equations. We further discuss the relations between diffusion
models and other deep generative models, including variational auto-encoders,
generative adversarial networks, energy-based models, autoregressive models and
normalizing flows. Then, we introduce a multi-perspective categorization of
diffusion models applied in computer vision. Finally, we illustrate the current
limitations of diffusion models and envision some interesting directions for
future research.Comment: Accepted in IEEE Transactions on Pattern Analysis and Machine
Intelligence. 25 pages, 3 figure
Diffusion-based Image Translation with Label Guidance for Domain Adaptive Semantic Segmentation
Translating images from a source domain to a target domain for learning
target models is one of the most common strategies in domain adaptive semantic
segmentation (DASS). However, existing methods still struggle to preserve
semantically-consistent local details between the original and translated
images. In this work, we present an innovative approach that addresses this
challenge by using source-domain labels as explicit guidance during image
translation. Concretely, we formulate cross-domain image translation as a
denoising diffusion process and utilize a novel Semantic Gradient Guidance
(SGG) method to constrain the translation process, conditioning it on the
pixel-wise source labels. Additionally, a Progressive Translation Learning
(PTL) strategy is devised to enable the SGG method to work reliably across
domains with large gaps. Extensive experiments demonstrate the superiority of
our approach over state-of-the-art methods.Comment: Accepted to ICCV202
Laplacian Denoising Autoencoder
While deep neural networks have been shown to perform remarkably well in many
machine learning tasks, labeling a large amount of ground truth data for
supervised training is usually very costly to scale. Therefore, learning robust
representations with unlabeled data is critical in relieving human effort and
vital for many downstream tasks. Recent advances in unsupervised and
self-supervised learning approaches for visual data have benefited greatly from
domain knowledge. Here we are interested in a more generic unsupervised
learning framework that can be easily generalized to other domains. In this
paper, we propose to learn data representations with a novel type of denoising
autoencoder, where the noisy input data is generated by corrupting latent clean
data in the gradient domain. This can be naturally generalized to span multiple
scales with a Laplacian pyramid representation of the input data. In this way,
the agent learns more robust representations that exploit the underlying data
structures across multiple scales. Experiments on several visual benchmarks
demonstrate that better representations can be learned with the proposed
approach, compared to its counterpart with single-scale corruption and other
approaches. Furthermore, we also demonstrate that the learned representations
perform well when transferring to other downstream vision tasks
- …