3 research outputs found
Deceiving Image-to-Image Translation Networks for Autonomous Driving with Adversarial Perturbations
Deep neural networks (DNNs) have achieved impressive performance on handling
computer vision problems, however, it has been found that DNNs are vulnerable
to adversarial examples. For such reason, adversarial perturbations have been
recently studied in several respects. However, most previous works have focused
on image classification tasks, and it has never been studied regarding
adversarial perturbations on Image-to-image (Im2Im) translation tasks, showing
great success in handling paired and/or unpaired mapping problems in the field
of autonomous driving and robotics. This paper examines different types of
adversarial perturbations that can fool Im2Im frameworks for autonomous driving
purpose. We propose both quasi-physical and digital adversarial perturbations
that can make Im2Im models yield unexpected results. We then empirically
analyze these perturbations and show that they generalize well under both
paired for image synthesis and unpaired settings for style transfer. We also
validate that there exist some perturbation thresholds over which the Im2Im
mapping is disrupted or impossible. The existence of these perturbations
reveals that there exist crucial weaknesses in Im2Im models. Lastly, we show
that our methods illustrate how perturbations affect the quality of outputs,
pioneering the improvement of the robustness of current SOTA networks for
autonomous driving.Comment: 8pages, Accepted on IEEE Robotics and Automation Letters (RAL
OGAN: Disrupting Deepfakes with an Adversarial Attack that Survives Training
Recent advances in autoencoders and generative models have given rise to
effective video forgery methods, used for generating so-called "deepfakes".
Mitigation research is mostly focused on post-factum deepfake detection and not
on prevention. We complement these efforts by introducing a novel class of
adversarial attacks---training-resistant attacks---which can disrupt
face-swapping autoencoders whether or not its adversarial images have been
included in the training set of said autoencoders. We propose the Oscillating
GAN (OGAN) attack, a novel attack optimized to be training-resistant, which
introduces spatial-temporal distortions to the output of face-swapping
autoencoders. To implement OGAN, we construct a bilevel optimization problem,
where we train a generator and a face-swapping model instance against each
other. Specifically, we pair each input image with a target distortion, and
feed them into a generator that produces an adversarial image. This image will
exhibit the distortion when a face-swapping autoencoder is applied to it. We
solve the optimization problem by training the generator and the face-swapping
model simultaneously using an iterative process of alternating optimization.
Next, we analyze the previously published Distorting Attack and show it is
training-resistant, though it is outperformed by our suggested OGAN. Finally,
we validate both attacks using a popular implementation of FaceSwap, and show
that they transfer across different target models and target faces, including
faces the adversarial attacks were not trained on. More broadly, these results
demonstrate the existence of training-resistant adversarial attacks,
potentially applicable to a wide range of domains.Comment: 10 page
Disrupting Deepfakes: Adversarial Attacks Against Conditional Image Translation Networks and Facial Manipulation Systems
Face modification systems using deep learning have become increasingly
powerful and accessible. Given images of a person's face, such systems can
generate new images of that same person under different expressions and poses.
Some systems can also modify targeted attributes such as hair color or age.
This type of manipulated images and video have been coined Deepfakes. In order
to prevent a malicious user from generating modified images of a person without
their consent we tackle the new problem of generating adversarial attacks
against such image translation systems, which disrupt the resulting output
image. We call this problem disrupting deepfakes. Most image translation
architectures are generative models conditioned on an attribute (e.g. put a
smile on this person's face). We are first to propose and successfully apply
(1) class transferable adversarial attacks that generalize to different
classes, which means that the attacker does not need to have knowledge about
the conditioning class, and (2) adversarial training for generative adversarial
networks (GANs) as a first step towards robust image translation networks.
Finally, in gray-box scenarios, blurring can mount a successful defense against
disruption. We present a spread-spectrum adversarial attack, which evades blur
defenses. Our open-source code can be found at
https://github.com/natanielruiz/disrupting-deepfakes.Comment: Accepted at CVPR 2020 Workshop on Adversarial Machine Learning in
Computer Visio