Indicators of Attack Failure: Debugging and Improving Optimization of
  Adversarial Examples

Biggio, Battista; Carlini, Nicholas; Demetrio, Luca; Demontis, Ambra; Manca, Giovanni; Pintor, Maura; Roli, Fabio; Sotgiu, Angelo

Indicators of Attack Failure: Debugging and Improving Optimization of Adversarial Examples

Authors: Battista Biggio
Nicholas Carlini
Luca Demetrio
Ambra Demontis
Giovanni Manca
Maura Pintor
Fabio Roli
Angelo Sotgiu
Publication date: 18 June 2021
Publisher

Abstract

Evaluating robustness of machine-learning models to adversarial examples is a challenging problem. Many defenses have been shown to provide a false sense of security by causing gradient-based attacks to fail, and they have been broken under more rigorous evaluations. Although guidelines and best practices have been suggested to improve current adversarial robustness evaluations, the lack of automatic testing and debugging tools makes it difficult to apply these recommendations in a systematic manner. In this work, we overcome these limitations by (i) defining a set of quantitative indicators which unveil common failures in the optimization of gradient-based attacks, and (ii) proposing specific mitigation strategies within a systematic evaluation protocol. Our extensive experimental analysis shows that the proposed indicators of failure can be used to visualize, debug and improve current adversarial robustness evaluations, providing a first concrete step towards automatizing and systematizing current adversarial robustness evaluations. Our open-source code is available at: https://github.com/pralab/IndicatorsOfAttackFailure

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Archivio istituzionale della ricerca - Università di Cagliari

oai:iris.unica.it:11584/344038

Last time updated on 08/02/2024

arXiv.org e-Print Archive

oai:arXiv.org:2106.09947

Last time updated on 22/06/2021