3,024 research outputs found
Art of singular vectors and universal adversarial perturbations
Vulnerability of Deep Neural Networks (DNNs) to adversarial attacks has been
attracting a lot of attention in recent studies. It has been shown that for
many state of the art DNNs performing image classification there exist
universal adversarial perturbations --- image-agnostic perturbations mere
addition of which to natural images with high probability leads to their
misclassification. In this work we propose a new algorithm for constructing
such universal perturbations. Our approach is based on computing the so-called
-singular vectors of the Jacobian matrices of hidden layers of a
network. Resulting perturbations present interesting visual patterns, and by
using only 64 images we were able to construct universal perturbations with
more than 60 \% fooling rate on the dataset consisting of 50000 images. We also
investigate a correlation between the maximal singular value of the Jacobian
matrix and the fooling rate of the corresponding singular vector, and show that
the constructed perturbations generalize across networks.Comment: Submitted to CVPR 201
Defense against Universal Adversarial Perturbations
Recent advances in Deep Learning show the existence of image-agnostic
quasi-imperceptible perturbations that when applied to `any' image can fool a
state-of-the-art network classifier to change its prediction about the image
label. These `Universal Adversarial Perturbations' pose a serious threat to the
success of Deep Learning in practice. We present the first dedicated framework
to effectively defend the networks against such perturbations. Our approach
learns a Perturbation Rectifying Network (PRN) as `pre-input' layers to a
targeted model, such that the targeted model needs no modification. The PRN is
learned from real and synthetic image-agnostic perturbations, where an
efficient method to compute the latter is also proposed. A perturbation
detector is separately trained on the Discrete Cosine Transform of the
input-output difference of the PRN. A query image is first passed through the
PRN and verified by the detector. If a perturbation is detected, the output of
the PRN is used for label prediction instead of the actual image. A rigorous
evaluation shows that our framework can defend the network classifiers against
unseen adversarial perturbations in the real-world scenarios with up to 97.5%
success rate. The PRN also generalizes well in the sense that training for one
targeted network defends another network with a comparable success rate.Comment: Accepted in IEEE CVPR 201
- …