1,159 research outputs found
Semantic Adversarial Attacks via Diffusion Models
Traditional adversarial attacks concentrate on manipulating clean examples in
the pixel space by adding adversarial perturbations. By contrast, semantic
adversarial attacks focus on changing semantic attributes of clean examples,
such as color, context, and features, which are more feasible in the real
world. In this paper, we propose a framework to quickly generate a semantic
adversarial attack by leveraging recent diffusion models since semantic
information is included in the latent space of well-trained diffusion models.
Then there are two variants of this framework: 1) the Semantic Transformation
(ST) approach fine-tunes the latent space of the generated image and/or the
diffusion model itself; 2) the Latent Masking (LM) approach masks the latent
space with another target image and local backpropagation-based interpretation
methods. Additionally, the ST approach can be applied in either white-box or
black-box settings. Extensive experiments are conducted on CelebA-HQ and AFHQ
datasets, and our framework demonstrates great fidelity, generalizability, and
transferability compared to other baselines. Our approaches achieve
approximately 100% attack success rate in multiple settings with the best FID
as 36.61. Code is available at
https://github.com/steven202/semantic_adv_via_dm.Comment: To appear in BMVC 202
Adversarial Color Enhancement: Generating Unrestricted Adversarial Images by Optimizing a Color Filter
We introduce an approach that enhances images using a color filter in order
to create adversarial effects, which fool neural networks into
misclassification. Our approach, Adversarial Color Enhancement (ACE), generates
unrestricted adversarial images by optimizing the color filter via gradient
descent. The novelty of ACE is its incorporation of established practice for
image enhancement in a transparent manner. Experimental results validate the
white-box adversarial strength and black-box transferability of ACE. A range of
examples demonstrates the perceptual quality of images that ACE produces. ACE
makes an important contribution to recent work that moves beyond
imperceptibility and focuses on unrestricted adversarial modifications that
yield large perceptible perturbations, but remain non-suspicious, to the human
eye. The future potential of filter-based adversaries is also explored in two
directions: guiding ACE with common enhancement practices (e.g., Instagram
filters) towards specific attractive image styles and adapting ACE to image
semantics. Code is available at https://github.com/ZhengyuZhao/ACE.Comment: Accepted by BMVC 2020. Code is available at
https://github.com/ZhengyuZhao/AC
Constructing Semantics-Aware Adversarial Examples with Probabilistic Perspective
In this study, we introduce a novel, probabilistic viewpoint on adversarial
examples, achieved through box-constrained Langevin Monte Carlo (LMC).
Proceeding from this perspective, we develop an innovative approach for
generating semantics-aware adversarial examples in a principled manner. This
methodology transcends the restriction imposed by geometric distance, instead
opting for semantic constraints. Our approach empowers individuals to
incorporate their personal comprehension of semantics into the model. Through
human evaluation, we validate that our semantics-aware adversarial examples
maintain their inherent meaning. Experimental findings on the MNIST and SVHN
datasets demonstrate that our semantics-aware adversarial examples can
effectively circumvent robust adversarial training methods tailored for
traditional adversarial attacks.Comment: 17 pages, 14 figure
Evading Forensic Classifiers with Attribute-Conditioned Adversarial Faces
The ability of generative models to produce highly realistic synthetic face
images has raised security and ethical concerns. As a first line of defense
against such fake faces, deep learning based forensic classifiers have been
developed. While these forensic models can detect whether a face image is
synthetic or real with high accuracy, they are also vulnerable to adversarial
attacks. Although such attacks can be highly successful in evading detection by
forensic classifiers, they introduce visible noise patterns that are detectable
through careful human scrutiny. Additionally, these attacks assume access to
the target model(s) which may not always be true. Attempts have been made to
directly perturb the latent space of GANs to produce adversarial fake faces
that can circumvent forensic classifiers. In this work, we go one step further
and show that it is possible to successfully generate adversarial fake faces
with a specified set of attributes (e.g., hair color, eye size, race, gender,
etc.). To achieve this goal, we leverage the state-of-the-art generative model
StyleGAN with disentangled representations, which enables a range of
modifications without leaving the manifold of natural images. We propose a
framework to search for adversarial latent codes within the feature space of
StyleGAN, where the search can be guided either by a text prompt or a reference
image. We also propose a meta-learning based optimization strategy to achieve
transferable performance on unknown target models. Extensive experiments
demonstrate that the proposed approach can produce semantically manipulated
adversarial fake faces, which are true to the specified attribute set and can
successfully fool forensic face classifiers, while remaining undetectable by
humans. Code: https://github.com/koushiksrivats/face_attribute_attack.Comment: Accepted in CVPR 2023. Project page:
https://koushiksrivats.github.io/face_attribute_attack
- …