Relating Implicit Bias and Adversarial Attacks through Intrinsic
  Dimension

Anselmi, Fabio; Basile, Lorenzo; Bortolussi, Luca; D'Onofrio, Alberto; Karantzas, Nikos; Rodriguez, Alex

Relating Implicit Bias and Adversarial Attacks through Intrinsic Dimension

Authors: Fabio Anselmi
Lorenzo Basile
Luca Bortolussi
Alberto D'Onofrio
Nikos Karantzas
Alex Rodriguez
Publication date: 24 May 2023
Publisher

Abstract

Despite their impressive performance in classification, neural networks are known to be vulnerable to adversarial attacks. These attacks are small perturbations of the input data designed to fool the model. Naturally, a question arises regarding the potential connection between the architecture, settings, or properties of the model and the nature of the attack. In this work, we aim to shed light on this problem by focusing on the implicit bias of the neural network, which refers to its inherent inclination to favor specific patterns or outcomes. Specifically, we investigate one aspect of the implicit bias, which involves the essential Fourier frequencies required for accurate image classification. We conduct tests to assess the statistical relationship between these frequencies and those necessary for a successful attack. To delve into this relationship, we propose a new method that can uncover non-linear correlations between sets of coordinates, which, in our case, are the aforementioned frequencies. By exploiting the entanglement between intrinsic dimension and correlation, we provide empirical evidence that the network bias in Fourier space and the target frequencies of adversarial attacks are closely tied

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2305.15203

Last time updated on 26/05/2023