47 research outputs found
Defense against Universal Adversarial Perturbations
Recent advances in Deep Learning show the existence of image-agnostic
quasi-imperceptible perturbations that when applied to `any' image can fool a
state-of-the-art network classifier to change its prediction about the image
label. These `Universal Adversarial Perturbations' pose a serious threat to the
success of Deep Learning in practice. We present the first dedicated framework
to effectively defend the networks against such perturbations. Our approach
learns a Perturbation Rectifying Network (PRN) as `pre-input' layers to a
targeted model, such that the targeted model needs no modification. The PRN is
learned from real and synthetic image-agnostic perturbations, where an
efficient method to compute the latter is also proposed. A perturbation
detector is separately trained on the Discrete Cosine Transform of the
input-output difference of the PRN. A query image is first passed through the
PRN and verified by the detector. If a perturbation is detected, the output of
the PRN is used for label prediction instead of the actual image. A rigorous
evaluation shows that our framework can defend the network classifiers against
unseen adversarial perturbations in the real-world scenarios with up to 97.5%
success rate. The PRN also generalizes well in the sense that training for one
targeted network defends another network with a comparable success rate.Comment: Accepted in IEEE CVPR 201