5,299 research outputs found
Adversarial Defense by Restricting the Hidden Space of Deep Neural Networks
Deep neural networks are vulnerable to adversarial attacks, which can fool
them by adding minuscule perturbations to the input images. The robustness of
existing defenses suffers greatly under white-box attack settings, where an
adversary has full knowledge about the network and can iterate several times to
find strong perturbations. We observe that the main reason for the existence of
such perturbations is the close proximity of different class samples in the
learned feature space. This allows model decisions to be totally changed by
adding an imperceptible perturbation in the inputs. To counter this, we propose
to class-wise disentangle the intermediate feature representations of deep
networks. Specifically, we force the features for each class to lie inside a
convex polytope that is maximally separated from the polytopes of other
classes. In this manner, the network is forced to learn distinct and distant
decision regions for each class. We observe that this simple constraint on the
features greatly enhances the robustness of learned models, even against the
strongest white-box attacks, without degrading the classification performance
on clean images. We report extensive evaluations in both black-box and
white-box attack scenarios and show significant gains in comparison to
state-of-the art defenses.Comment: Accepted at ICCV 201
Decision-BADGE: Decision-based Adversarial Batch Attack with Directional Gradient Estimation
The susceptibility of deep neural networks (DNNs) to adversarial examples has
prompted an increase in the deployment of adversarial attacks. Image-agnostic
universal adversarial perturbations (UAPs) are much more threatening, but many
limitations exist to implementing UAPs in real-world scenarios where only
binary decisions are returned. In this research, we propose Decision-BADGE, a
novel method to craft universal adversarial perturbations for executing
decision-based black-box attacks. To optimize perturbation with decisions, we
addressed two challenges, namely the magnitude and the direction of the
gradient. First, we use batch loss, differences from distributions of ground
truth, and accumulating decisions in batches to determine the magnitude of the
gradient. This magnitude is applied in the direction of the revised
simultaneous perturbation stochastic approximation (SPSA) to update the
perturbation. This simple yet efficient method can be easily extended to
score-based attacks as well as targeted attacks. Experimental validation across
multiple victim models demonstrates that the Decision-BADGE outperforms
existing attack methods, even image-specific and score-based attacks. In
particular, our proposed method shows a superior success rate with less
training time. The research also shows that Decision-BADGE can successfully
deceive unseen victim models and accurately target specific classes.Comment: 9 pages (7 pages except for references), 4 figures, 4 table
- …