270 research outputs found
Decision-BADGE: Decision-based Adversarial Batch Attack with Directional Gradient Estimation
The susceptibility of deep neural networks (DNNs) to adversarial examples has
prompted an increase in the deployment of adversarial attacks. Image-agnostic
universal adversarial perturbations (UAPs) are much more threatening, but many
limitations exist to implementing UAPs in real-world scenarios where only
binary decisions are returned. In this research, we propose Decision-BADGE, a
novel method to craft universal adversarial perturbations for executing
decision-based black-box attacks. To optimize perturbation with decisions, we
addressed two challenges, namely the magnitude and the direction of the
gradient. First, we use batch loss, differences from distributions of ground
truth, and accumulating decisions in batches to determine the magnitude of the
gradient. This magnitude is applied in the direction of the revised
simultaneous perturbation stochastic approximation (SPSA) to update the
perturbation. This simple yet efficient method can be easily extended to
score-based attacks as well as targeted attacks. Experimental validation across
multiple victim models demonstrates that the Decision-BADGE outperforms
existing attack methods, even image-specific and score-based attacks. In
particular, our proposed method shows a superior success rate with less
training time. The research also shows that Decision-BADGE can successfully
deceive unseen victim models and accurately target specific classes.Comment: 9 pages (7 pages except for references), 4 figures, 4 table
HotFlip: White-Box Adversarial Examples for Text Classification
We propose an efficient method to generate white-box adversarial examples to
trick a character-level neural classifier. We find that only a few
manipulations are needed to greatly decrease the accuracy. Our method relies on
an atomic flip operation, which swaps one token for another, based on the
gradients of the one-hot input vectors. Due to efficiency of our method, we can
perform adversarial training which makes the model more robust to attacks at
test time. With the use of a few semantics-preserving constraints, we
demonstrate that HotFlip can be adapted to attack a word-level classifier as
well
- …