1,959 research outputs found
MARGIN: Uncovering Deep Neural Networks using Graph Signal Analysis
Interpretability has emerged as a crucial aspect of machine learning, aimed
at providing insights into the working of complex neural networks. However,
existing solutions vary vastly based on the nature of the interpretability
task, with each use case requiring substantial time and effort. This paper
introduces MARGIN, a simple yet general approach to address a large set of
interpretability tasks ranging from identifying prototypes to explaining image
predictions. MARGIN exploits ideas rooted in graph signal analysis to determine
influential nodes in a graph, which are defined as those nodes that maximally
describe a function defined on the graph. By carefully defining task-specific
graphs and functions, we demonstrate that MARGIN outperforms existing
approaches in a number of disparate interpretability challenges.Comment: Technical Repor
What Learned Representations and Influence Functions Can Tell Us About Adversarial Examples
Adversarial examples, deliberately crafted using small perturbations to fool
deep neural networks, were first studied in image processing and more recently
in NLP. While approaches to detecting adversarial examples in NLP have largely
relied on search over input perturbations, image processing has seen a range of
techniques that aim to characterise adversarial subspaces over the learned
representations.
In this paper, we adapt two such approaches to NLP, one based on nearest
neighbors and influence functions and one on Mahalanobis distances. The former
in particular produces a state-of-the-art detector when compared against
several strong baselines; moreover, the novel use of influence functions
provides insight into how the nature of adversarial example subspaces in NLP
relate to those in image processing, and also how they differ depending on the
kind of NLP task.Comment: 20 pages, Accepted in IJCNLP_AACL 202
SpectralDefense: Detecting Adversarial Attacks on CNNs in the Fourier Domain
Despite the success of convolutional neural networks (CNNs) in many computer
vision and image analysis tasks, they remain vulnerable against so-called
adversarial attacks: Small, crafted perturbations in the input images can lead
to false predictions. A possible defense is to detect adversarial examples. In
this work, we show how analysis in the Fourier domain of input images and
feature maps can be used to distinguish benign test samples from adversarial
images. We propose two novel detection methods: Our first method employs the
magnitude spectrum of the input images to detect an adversarial attack. This
simple and robust classifier can successfully detect adversarial perturbations
of three commonly used attack methods. The second method builds upon the first
and additionally extracts the phase of Fourier coefficients of feature-maps at
different layers of the network. With this extension, we are able to improve
adversarial detection rates compared to state-of-the-art detectors on five
different attack methods
Application of Adversarial Attacks on Malware Detection Models
Malware detection is vital as it ensures that a computer is safe from any kind of malicious software that puts users at risk. Too many variants of these malicious software are being introduced everyday at increased speed. Thus, to guarantee security of computer systems, huge advancements in the field of malware detection are made and one such approach is to use machine learning for malware detection. Even though machine learning is very powerful, it is prone to adversarial attacks. In this project, we will try to apply adversarial attacks on malware detection models. To perform these attacks, fake samples that are generated using Generative Adversarial Networks (GAN) algorithm are used and these fake malware data along with the actual data is given to a machine learning model for malware detection. Here, we will also be experimenting with the percentage of fake malware samples to be considered and observe the behavior of the model according to the given input. The novelty of this project is given by the use of adversarial samples that are generated by the implementation of word embeddings produced by our generative algorithms
Adversarial-Aware Deep Learning System based on a Secondary Classical Machine Learning Verification Approach
Deep learning models have been used in creating various effective image
classification applications. However, they are vulnerable to adversarial
attacks that seek to misguide the models into predicting incorrect classes. Our
study of major adversarial attack models shows that they all specifically
target and exploit the neural networking structures in their designs. This
understanding makes us develop a hypothesis that most classical machine
learning models, such as Random Forest (RF), are immune to adversarial attack
models because they do not rely on neural network design at all. Our
experimental study of classical machine learning models against popular
adversarial attacks supports this hypothesis. Based on this hypothesis, we
propose a new adversarial-aware deep learning system by using a classical
machine learning model as the secondary verification system to complement the
primary deep learning model in image classification. Although the secondary
classical machine learning model has less accurate output, it is only used for
verification purposes, which does not impact the output accuracy of the primary
deep learning model, and at the same time, can effectively detect an
adversarial attack when a clear mismatch occurs. Our experiments based on
CIFAR-100 dataset show that our proposed approach outperforms current
state-of-the-art adversarial defense systems.Comment: 17 pages, 3 figure
- …