4 research outputs found
RAEDiff: Denoising Diffusion Probabilistic Models Based Reversible Adversarial Examples Self-Generation and Self-Recovery
Collected and annotated datasets, which are obtained through extensive
efforts, are effective for training Deep Neural Network (DNN) models. However,
these datasets are susceptible to be misused by unauthorized users, resulting
in infringement of Intellectual Property (IP) rights owned by the dataset
creators. Reversible Adversarial Exsamples (RAE) can help to solve the issues
of IP protection for datasets. RAEs are adversarial perturbed images that can
be restored to the original. As a cutting-edge approach, RAE scheme can serve
the purposes of preventing unauthorized users from engaging in malicious model
training, as well as ensuring the legitimate usage of authorized users.
Nevertheless, in the existing work, RAEs still rely on the embedded auxiliary
information for restoration, which may compromise their adversarial abilities.
In this paper, a novel self-generation and self-recovery method, named as
RAEDiff, is introduced for generating RAEs based on a Denoising Diffusion
Probabilistic Models (DDPM). It diffuses datasets into a Biased Gaussian
Distribution (BGD) and utilizes the prior knowledge of the DDPM for generating
and recovering RAEs. The experimental results demonstrate that RAEDiff
effectively self-generates adversarial perturbations for DNN models, including
Artificial Intelligence Generated Content (AIGC) models, while also exhibiting
significant self-recovery capabilities
Reversible attack based on local visible adversarial perturbation
Adding perturbation to images can mislead classification models to produce incorrect results. Based on this, research has exploited adversarial perturbation to protect private images from retrieval by malicious intelligent models. However, adding adversarial perturbation to images destroys the original data, making images useless in digital forensics and other fields. To prevent illegal or unauthorized access to sensitive image data such as human faces without impeding legitimate users, the use of reversible adversarial attack techniques is becoming more widely investigated, where the original image can be recovered from its reversible adversarial examples. However, existing reversible adversarial attack methods are designed for traditional imperceptible adversarial perturbation and ignore the local visible adversarial perturbation. In this paper, we propose a new method for generating reversible adversarial examples based on local visible adversarial perturbation. The information needed for image recovery is embedded into the area beyond the adversarial patch by the reversible data hiding technique. To reduce image distortion, lossless compression and the B-R-G (blue-red-green) embedding principle are adopted. Experiments on CIFAR-10 and ImageNet datasets show that the proposed method can restore the original images error-free while ensuring good attack performance