532 research outputs found
Adaptive Adversarial Training Does Not Increase Recourse Costs
Recent work has connected adversarial attack methods and algorithmic recourse
methods: both seek minimal changes to an input instance which alter a model's
classification decision. It has been shown that traditional adversarial
training, which seeks to minimize a classifier's susceptibility to malicious
perturbations, increases the cost of generated recourse; with larger
adversarial training radii correlating with higher recourse costs. From the
perspective of algorithmic recourse, however, the appropriate adversarial
training radius has always been unknown. Another recent line of work has
motivated adversarial training with adaptive training radii to address the
issue of instance-wise variable adversarial vulnerability, showing success in
domains with unknown attack radii. This work studies the effects of adaptive
adversarial training on algorithmic recourse costs. We establish that the
improvements in model robustness induced by adaptive adversarial training show
little effect on algorithmic recourse costs, providing a potential avenue for
affordable robustness in domains where recoursability is critical
Does Saliency-Based Training bring Robustness for Deep Neural Networks in Image Classification?
Deep Neural Networks are powerful tools to understand complex patterns and
making decisions. However, their black-box nature impedes a complete
understanding of their inner workings. While online saliency-guided training
methods try to highlight the prominent features in the model's output to
alleviate this problem, it is still ambiguous if the visually explainable
features align with robustness of the model against adversarial examples. In
this paper, we investigate the saliency trained model's vulnerability to
adversarial examples methods. Models are trained using an online
saliency-guided training method and evaluated against popular algorithms of
adversarial examples. We quantify the robustness and conclude that despite the
well-explained visualizations in the model's output, the salient models suffer
from the lower performance against adversarial examples attacks
- …