Search CORE

7,287 research outputs found

Analysis of Dominant Classes in Universal Adversarial Perturbations

Author: Lozano J.A.
Santana R.
Vadillo J.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2022
Field of study

The reasons why Deep Neural Networks are susceptible to being fooled by adversarial examples remains an open discussion. Indeed, many differ- ent strategies can be employed to efficiently generate adversarial attacks, some of them relying on different theoretical justifications. Among these strategies, universal (input-agnostic) perturbations are of particular inter- est, due to their capability to fool a network independently of the input in which the perturbation is applied. In this work, we investigate an in- triguing phenomenon of universal perturbations, which has been reported previously in the literature, yet without a proven justification: universal perturbations change the predicted classes for most inputs into one par- ticular (dominant) class, even if this behavior is not specified during the creation of the perturbation. In order to justify the cause of this phe- nomenon, we propose a number of hypotheses and experimentally test them using a speech command classification problem in the audio domain as a testbed. Our analyses reveal interesting properties of universal per- turbations, suggest new methods to generate such attacks and provide an explanation of dominant classes, under both a geometric and a data- feature perspective.IT1244-19 PRE_2019_1_0128 TIN2016-78365-R PID2019-104966GB-I00 FPU19/0323

BCAM's Institutional Repository Data

Analysis of dominant classes in universal adversarial perturbations

Author: Lozano Alonso José Antonio
Santana Hermida Roberto
Vadillo Jueguen Jon
Publication venue: Elsevier
Publication date: 01/01/2022
Field of study

The reasons why Deep Neural Networks are susceptible to being fooled by adversarial examples remains an open discussion. Indeed, many different strategies can be employed to efficiently generate adversarial attacks, some of them relying on different theoretical justifications. Among these strategies, universal (input-agnostic) perturbations are of particular interest, due to their capability to fool a network independently of the input in which the perturbation is applied. In this work, we investigate an intriguing phenomenon of universal perturbations, which has been reported previously in the literature, yet without a proven justification: universal perturbations change the predicted classes for most inputs into one particular (dominant) class, even if this behavior is not specified during the creation of the perturbation. In order to justify the cause of this phenomenon, we propose a number of hypotheses and experimentally test them using a speech command classification problem in the audio domain as a testbed. Our analyses reveal interesting properties of universal perturbations, suggest new methods to generate such attacks and provide an explanation of dominant classes, under both a geometric and a data-feature perspective.This work is supported by the Basque Government, Spain (BERC 2018–2021 program, project KK-2020/00049 through the ELKARTEK program, IT1244-19, and PRE_2019_1_0128 predoctoral grant), by the Spanish Ministry of Economy and Competitiveness MINECO, Spain (projects TIN2016-78365-R and PID2019-104966GB-I00) and by the Spanish Ministry of Science, Innovation and Universities, Spain (FPU19/03231 predoctoral grant). Jose A. Lozano acknowledges support by the Spanish Ministry of Science, Innovation and Universities, Spain through BCAM Severo Ochoa accreditation (SEV-2017-0718)

Archivo Digital para la Docencia y la Investigación

On the human evaluation of universal audio adversarial perturbations

Author: Santana Hermida Roberto
Vadillo Jueguen Jon
Publication venue: 'Elsevier BV'
Publication date: 01/01/2022
Field of study

[EN] Human-machine interaction is increasingly dependent on speech communication, mainly due to the remarkable performance of Machine Learning models in speech recognition tasks. However, these models can be fooled by adversarial examples, which are inputs in-tentionally perturbed to produce a wrong prediction without the changes being noticeable to humans. While much research has focused on developing new techniques to generate adversarial perturbations, less attention has been given to aspects that determine whether and how the perturbations are noticed by humans. This question is relevant since high fool-ing rates of proposed adversarial perturbation strategies are only valuable if the perturba-tions are not detectable. In this paper we investigate to which extent the distortion metrics proposed in the literature for audio adversarial examples, and which are commonly applied to evaluate the effectiveness of methods for generating these attacks, are a reliable mea-sure of the human perception of the perturbations. Using an analytical framework, and an experiment in which 36 subjects evaluate audio adversarial examples according to different factors, we demonstrate that the metrics employed by convention are not a reliable measure of the perceptual similarity of adversarial examples in the audio domain.This work was supported by the Basque Government (PRE_2019_1_0128 predoctoral grant, IT1244-19 and project KK-2020/00049 through the ELKARTEK program); the Spanish Ministry of Economy and Competitiveness MINECO (projects TIN2016-78365-R and PID2019-104966GB-I00); and the Spanish Ministry of Science, Innovation and Universities (FPU19/03231 predoctoral grant). The authors would also like to thank to the Intelligent Systems Group (University of the Basque Country UPV/EHU, Spain) for providing the computational resources needed to develop the project, as well as to all the participants that took part in the experiments

Archivo Digital para la Docencia y la Investigación