3 research outputs found
A Deep Dive into Adversarial Robustness in Zero-Shot Learning
Machine learning (ML) systems have introduced significant advances in various
fields, due to the introduction of highly complex models. Despite their
success, it has been shown multiple times that machine learning models are
prone to imperceptible perturbations that can severely degrade their accuracy.
So far, existing studies have primarily focused on models where supervision
across all classes were available. In constrast, Zero-shot Learning (ZSL) and
Generalized Zero-shot Learning (GZSL) tasks inherently lack supervision across
all classes. In this paper, we present a study aimed on evaluating the
adversarial robustness of ZSL and GZSL models. We leverage the well-established
label embedding model and subject it to a set of established adversarial
attacks and defenses across multiple datasets. In addition to creating possibly
the first benchmark on adversarial robustness of ZSL models, we also present
analyses on important points that require attention for better interpretation
of ZSL robustness results. We hope these points, along with the benchmark, will
help researchers establish a better understanding what challenges lie ahead and
help guide their work.Comment: To appear in ECCV 2020, Workshop on Adversarial Robustness in the
Real Worl
Universal Adversarial Perturbations for Malware
Machine learning classification models are vulnerable to adversarial examples
-- effective input-specific perturbations that can manipulate the model's
output. Universal Adversarial Perturbations (UAPs), which identify noisy
patterns that generalize across the input space, allow the attacker to greatly
scale up the generation of these adversarial examples. Although UAPs have been
explored in application domains beyond computer vision, little is known about
their properties and implications in the specific context of realizable
attacks, such as malware, where attackers must reason about satisfying
challenging problem-space constraints.
In this paper, we explore the challenges and strengths of UAPs in the context
of malware classification. We generate sequences of problem-space
transformations that induce UAPs in the corresponding feature-space embedding
and evaluate their effectiveness across threat models that consider a varying
degree of realistic attacker knowledge. Additionally, we propose adversarial
training-based mitigations using knowledge derived from the problem-space
transformations, and compare against alternative feature-space defenses. Our
experiments limit the effectiveness of a white box Android evasion attack to
~20 % at the cost of 3 % TPR at 1 % FPR. We additionally show how our method
can be adapted to more restrictive application domains such as Windows malware.
We observe that while adversarial training in the feature space must deal
with large and often unconstrained regions, UAPs in the problem space identify
specific vulnerabilities that allow us to harden a classifier more effectively,
shifting the challenges and associated cost of identifying new universal
adversarial transformations back to the attacker
Defending Against Universal Attacks Through Selective Feature Regeneration
Deep neural network (DNN) predictions have been shown to be vulnerable to carefully crafted adversarial perturbations. Specifically, image-agnostic (universal adversarial) perturbations added to any image can fool a target network into making erroneous predictions. Departing from existing defense strategies that work mostly in the image domain, we present a novel defense which operates in the DNN feature domain and effectively defends against such universal perturbations. Our approach identifies pre-trained convolutional features that are most vulnerable to adversarial noise and deploys trainable feature regeneration units which transform these DNN filter activations into resilient features that are robust to universal perturbations. Regenerating only the top 50% adversarially susceptible activations in at most 6 DNN layers and leaving all remaining DNN activations unchanged, we outperform existing defense strategies across different network architectures by more than 10% in restored accuracy. We show that without any additional modification, our defense trained on ImageNet with one type of universal attack examples effectively defends against other types of unseen universal attacks