8 research outputs found
Procedural Noise Adversarial Examples for Black-Box Attacks on Deep Convolutional Networks
Deep Convolutional Networks (DCNs) have been shown to be vulnerable to
adversarial examples---perturbed inputs specifically designed to produce
intentional errors in the learning algorithms at test time. Existing
input-agnostic adversarial perturbations exhibit interesting visual patterns
that are currently unexplained. In this paper, we introduce a structured
approach for generating Universal Adversarial Perturbations (UAPs) with
procedural noise functions. Our approach unveils the systemic vulnerability of
popular DCN models like Inception v3 and YOLO v3, with single noise patterns
able to fool a model on up to 90% of the dataset. Procedural noise allows us to
generate a distribution of UAPs with high universal evasion rates using only a
few parameters. Additionally, we propose Bayesian optimization to efficiently
learn procedural noise parameters to construct inexpensive untargeted black-box
attacks. We demonstrate that it can achieve an average of less than 10 queries
per successful attack, a 100-fold improvement on existing methods. We further
motivate the use of input-agnostic defences to increase the stability of models
to adversarial perturbations. The universality of our attacks suggests that DCN
models may be sensitive to aggregations of low-level class-agnostic features.
These findings give insight on the nature of some universal adversarial
perturbations and how they could be generated in other applications.Comment: 16 pages, 10 figures. In Proceedings of the 2019 ACM SIGSAC
Conference on Computer and Communications Security (CCS '19
Vulnerability of deep neural networks for detecting COVID-19 cases from chest X-ray images to universal adversarial attacks
Under the epidemic of the novel coronavirus disease 2019 (COVID-19), chest
X-ray computed tomography imaging is being used for effectively screening
COVID-19 patients. The development of computer-aided systems based on deep
neural networks (DNNs) has been advanced, to rapidly and accurately detect
COVID-19 cases, because the need for expert radiologists, who are limited in
number, forms a bottleneck for the screening. However, so far, the
vulnerability of DNN-based systems has been poorly evaluated, although DNNs are
vulnerable to a single perturbation, called universal adversarial perturbation
(UAP), which can induce DNN failure in most classification tasks. Thus, we
focus on representative DNN models for detecting COVID-19 cases from chest
X-ray images and evaluate their vulnerability to UAPs generated using simple
iterative algorithms. We consider nontargeted UAPs, which cause a task failure
resulting in an input being assigned an incorrect label, and targeted UAPs,
which cause the DNN to classify an input into a specific class. The results
demonstrate that the models are vulnerable to nontargeted and targeted UAPs,
even in case of small UAPs. In particular, 2% norm of the UPAs to the average
norm of an image in the image dataset achieves >85% and >90% success rates for
the nontargeted and targeted attacks, respectively. Due to the nontargeted
UAPs, the DNN models judge most chest X-ray images as COVID-19 cases. The
targeted UAPs make the DNN models classify most chest X-ray images into a given
target class. The results indicate that careful consideration is required in
practical applications of DNNs to COVID-19 diagnosis; in particular, they
emphasize the need for strategies to address security concerns. As an example,
we show that iterative fine-tuning of the DNN models using UAPs improves the
robustness of the DNN models against UAPs.Comment: 17 pages, 5 figures, 3 table
Natural images allow universal adversarial attacks on medical image classification using deep neural networks with transfer learning
Transfer learning from natural images is used in deep neural networks (DNNs) for medical image classification to achieve a computer-aided clinical diagnosis. Although the adversarial vulnerability of DNNs hinders practical applications owing to the high stakes of diagnosis, adversarial attacks are expected to be limited because training datasets (medical images), which are often required for adversarial attacks, are generally unavailable in terms of security and privacy preservation. Nevertheless, in this study, we demonstrated that adversarial attacks are also possible using natural images for medical DNN models with transfer learning, even if such medical images are unavailable; in particular, we showed that universal adversarial perturbations (UAPs) can also be generated from natural images. UAPs from natural images are useful for both non-targeted and targeted attacks. The performance of UAPs from natural images was significantly higher than that of random controls. The use of transfer learning causes a security hole, which decreases the reliability and safety of computer-based disease diagnosis. Model training from random initialization reduced the performance of UAPs from natural images; however, it did not completely avoid vulnerability to UAPs. The vulnerability of UAPs to natural images is expected to become a significant security threat
Universal Adversarial Perturbations for Malware
Machine learning classification models are vulnerable to adversarial examples
-- effective input-specific perturbations that can manipulate the model's
output. Universal Adversarial Perturbations (UAPs), which identify noisy
patterns that generalize across the input space, allow the attacker to greatly
scale up the generation of these adversarial examples. Although UAPs have been
explored in application domains beyond computer vision, little is known about
their properties and implications in the specific context of realizable
attacks, such as malware, where attackers must reason about satisfying
challenging problem-space constraints.
In this paper, we explore the challenges and strengths of UAPs in the context
of malware classification. We generate sequences of problem-space
transformations that induce UAPs in the corresponding feature-space embedding
and evaluate their effectiveness across threat models that consider a varying
degree of realistic attacker knowledge. Additionally, we propose adversarial
training-based mitigations using knowledge derived from the problem-space
transformations, and compare against alternative feature-space defenses. Our
experiments limit the effectiveness of a white box Android evasion attack to
~20 % at the cost of 3 % TPR at 1 % FPR. We additionally show how our method
can be adapted to more restrictive application domains such as Windows malware.
We observe that while adversarial training in the feature space must deal
with large and often unconstrained regions, UAPs in the problem space identify
specific vulnerabilities that allow us to harden a classifier more effectively,
shifting the challenges and associated cost of identifying new universal
adversarial transformations back to the attacker