Evaluating robustness of machine-learning models to adversarial examples is a
challenging problem. Many defenses have been shown to provide a false sense of
security by causing gradient-based attacks to fail, and they have been broken
under more rigorous evaluations. Although guidelines and best practices have
been suggested to improve current adversarial robustness evaluations, the lack
of automatic testing and debugging tools makes it difficult to apply these
recommendations in a systematic manner. In this work, we overcome these
limitations by (i) defining a set of quantitative indicators which unveil
common failures in the optimization of gradient-based attacks, and (ii)
proposing specific mitigation strategies within a systematic evaluation
protocol. Our extensive experimental analysis shows that the proposed
indicators of failure can be used to visualize, debug and improve current
adversarial robustness evaluations, providing a first concrete step towards
automatizing and systematizing current adversarial robustness evaluations. Our
open-source code is available at:
https://github.com/pralab/IndicatorsOfAttackFailure