6 research outputs found
A Survey on Transferability of Adversarial Examples across Deep Neural Networks
The emergence of Deep Neural Networks (DNNs) has revolutionized various
domains, enabling the resolution of complex tasks spanning image recognition,
natural language processing, and scientific problem-solving. However, this
progress has also exposed a concerning vulnerability: adversarial examples.
These crafted inputs, imperceptible to humans, can manipulate machine learning
models into making erroneous predictions, raising concerns for safety-critical
applications. An intriguing property of this phenomenon is the transferability
of adversarial examples, where perturbations crafted for one model can deceive
another, often with a different architecture. This intriguing property enables
"black-box" attacks, circumventing the need for detailed knowledge of the
target model. This survey explores the landscape of the adversarial
transferability of adversarial examples. We categorize existing methodologies
to enhance adversarial transferability and discuss the fundamental principles
guiding each approach. While the predominant body of research primarily
concentrates on image classification, we also extend our discussion to
encompass other vision tasks and beyond. Challenges and future prospects are
discussed, highlighting the importance of fortifying DNNs against adversarial
vulnerabilities in an evolving landscape
On the Robustness of Explanations of Deep Neural Network Models: A Survey
Explainability has been widely stated as a cornerstone of the responsible and
trustworthy use of machine learning models. With the ubiquitous use of Deep
Neural Network (DNN) models expanding to risk-sensitive and safety-critical
domains, many methods have been proposed to explain the decisions of these
models. Recent years have also seen concerted efforts that have shown how such
explanations can be distorted (attacked) by minor input perturbations. While
there have been many surveys that review explainability methods themselves,
there has been no effort hitherto to assimilate the different methods and
metrics proposed to study the robustness of explanations of DNN models. In this
work, we present a comprehensive survey of methods that study, understand,
attack, and defend explanations of DNN models. We also present a detailed
review of different metrics used to evaluate explanation methods, as well as
describe attributional attack and defense methods. We conclude with lessons and
take-aways for the community towards ensuring robust explanations of DNN model
predictions.Comment: Under Review ACM Computing Surveys "Special Issue on Trustworthy AI
Recommended from our members
A Taxonomy and Survey of Attacks Against Machine Learning
The majority of machine learning methodologies operate with the assumption that their environment is benign. However, this assumption does not always hold, as it is often advantageous to adversaries to maliciously modify the training (poisoning attacks) or test data (evasion attacks). Such attacks can be catastrophic given the growth and the penetration of machine learning applications in society. Therefore, there is a need to secure machine learning enabling the safe adoption of it in adversarial cases, such as spam filtering, malware detection, and biometric recognition. This paper presents a taxonomy and survey of attacks against systems that use machine learning. It organizes the body of knowledge in adversarial machine learning so as to identify the aspects where researchers from different fields can contribute to. The taxonomy identifies attacks which share key characteristics and as such can potentially be addressed by the same defense approaches. Thus, the proposed taxonomy makes it easier to understand the existing attack landscape towards developing defence mechanisms, which are not investigated in this survey. The taxonomy is also leveraged to identify open problems that can lead to new research areas within the field of adversarial machine learning
A survey of practical adversarial example attacks
Abstract Adversarial examples revealed the weakness of machine learning techniques in terms of robustness, which moreover inspired adversaries to make use of the weakness to attack systems employing machine learning. Existing researches covered the methodologies of adversarial example generation, the root reason of the existence of adversarial examples, and some defense schemes. However practical attack against real world systems did not appear until recent, mainly because of the difficulty in injecting a artificially generated example into the model behind the hosting system without breaking the integrity. Recent case study works against face recognition systems and road sign recognition systems finally abridged the gap between theoretical adversarial example generation methodologies and practical attack schemes against real systems. To guide future research in defending adversarial examples in the real world, we formalize the threat model for practical attacks with adversarial examples, and also analyze the restrictions and key procedures for launching real world adversarial example attacks