429 research outputs found
The Effects of JPEG and JPEG2000 Compression on Attacks using Adversarial Examples
Adversarial examples are known to have a negative effect on the performance
of classifiers which have otherwise good performance on undisturbed images.
These examples are generated by adding non-random noise to the testing samples
in order to make classifier misclassify the given data. Adversarial attacks use
these intentionally generated examples and they pose a security risk to the
machine learning based systems. To be immune to such attacks, it is desirable
to have a pre-processing mechanism which removes these effects causing
misclassification while keeping the content of the image. JPEG and JPEG2000 are
well-known image compression techniques which suppress the high-frequency
content taking the human visual system into account. JPEG has been also shown
to be an effective method for reducing adversarial noise. In this paper, we
propose applying JPEG2000 compression as an alternative and systematically
compare the classification performance of adversarial images compressed using
JPEG and JPEG2000 at different target PSNR values and maximum compression
levels. Our experiments show that JPEG2000 is more effective in reducing
adversarial noise as it allows higher compression rates with less distortion
and it does not introduce blocking artifacts
Defense against Universal Adversarial Perturbations
Recent advances in Deep Learning show the existence of image-agnostic
quasi-imperceptible perturbations that when applied to `any' image can fool a
state-of-the-art network classifier to change its prediction about the image
label. These `Universal Adversarial Perturbations' pose a serious threat to the
success of Deep Learning in practice. We present the first dedicated framework
to effectively defend the networks against such perturbations. Our approach
learns a Perturbation Rectifying Network (PRN) as `pre-input' layers to a
targeted model, such that the targeted model needs no modification. The PRN is
learned from real and synthetic image-agnostic perturbations, where an
efficient method to compute the latter is also proposed. A perturbation
detector is separately trained on the Discrete Cosine Transform of the
input-output difference of the PRN. A query image is first passed through the
PRN and verified by the detector. If a perturbation is detected, the output of
the PRN is used for label prediction instead of the actual image. A rigorous
evaluation shows that our framework can defend the network classifiers against
unseen adversarial perturbations in the real-world scenarios with up to 97.5%
success rate. The PRN also generalizes well in the sense that training for one
targeted network defends another network with a comparable success rate.Comment: Accepted in IEEE CVPR 201
Adversarially Robust Distillation
Knowledge distillation is effective for producing small, high-performance
neural networks for classification, but these small networks are vulnerable to
adversarial attacks. This paper studies how adversarial robustness transfers
from teacher to student during knowledge distillation. We find that a large
amount of robustness may be inherited by the student even when distilled on
only clean images. Second, we introduce Adversarially Robust Distillation (ARD)
for distilling robustness onto student networks. In addition to producing small
models with high test accuracy like conventional distillation, ARD also passes
the superior robustness of large networks onto the student. In our experiments,
we find that ARD student models decisively outperform adversarially trained
networks of identical architecture in terms of robust accuracy, surpassing
state-of-the-art methods on standard robustness benchmarks. Finally, we adapt
recent fast adversarial training methods to ARD for accelerated robust
distillation.Comment: Accepted to AAAI Conference on Artificial Intelligence, 202
- …