1,027 research outputs found
Minimum-Norm Adversarial Examples on KNN and KNN-Based Models
We study the robustness against adversarial examples of kNN classifiers and
classifiers that combine kNN with neural networks. The main difficulty lies in
the fact that finding an optimal attack on kNN is intractable for typical
datasets. In this work, we propose a gradient-based attack on kNN and kNN-based
defenses, inspired by the previous work by Sitawarin & Wagner [1]. We
demonstrate that our attack outperforms their method on all of the models we
tested with only a minimal increase in the computation time. The attack also
beats the state-of-the-art attack [2] on kNN when k > 1 using less than 1% of
its running time. We hope that this attack can be used as a new baseline for
evaluating the robustness of kNN and its variants.Comment: 3rd Deep Learning and Security Workshop (co-located with the 41st
IEEE Symposium on Security and Privacy
Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples
Many machine learning models are vulnerable to adversarial examples: inputs
that are specially crafted to cause a machine learning model to produce an
incorrect output. Adversarial examples that affect one model often affect
another model, even if the two models have different architectures or were
trained on different training sets, so long as both models were trained to
perform the same task. An attacker may therefore train their own substitute
model, craft adversarial examples against the substitute, and transfer them to
a victim model, with very little information about the victim. Recent work has
further developed a technique that uses the victim model as an oracle to label
a synthetic training set for the substitute, so the attacker need not even
collect a training set to mount the attack. We extend these recent techniques
using reservoir sampling to greatly enhance the efficiency of the training
procedure for the substitute model. We introduce new transferability attacks
between previously unexplored (substitute, victim) pairs of machine learning
model classes, most notably SVMs and decision trees. We demonstrate our attacks
on two commercial machine learning classification systems from Amazon (96.19%
misclassification rate) and Google (88.94%) using only 800 queries of the
victim model, thereby showing that existing machine learning approaches are in
general vulnerable to systematic black-box attacks regardless of their
structure
AdvKnn: Adversarial Attacks On K-Nearest Neighbor Classifiers With Approximate Gradients
Deep neural networks have been shown to be vulnerable to adversarial
examples---maliciously crafted examples that can trigger the target model to
misbehave by adding imperceptible perturbations. Existing attack methods for
k-nearest neighbor~(kNN) based algorithms either require large perturbations or
are not applicable for large k. To handle this problem, this paper proposes a
new method called AdvKNN for evaluating the adversarial robustness of kNN-based
models. Firstly, we propose a deep kNN block to approximate the output of kNN
methods, which is differentiable thus can provide gradients for attacks to
cross the decision boundary with small distortions. Second, a new consistency
learning for distribution instead of classification is proposed for the
effectiveness in distribution based methods. Extensive experimental results
indicate that the proposed method significantly outperforms state of the art in
terms of attack success rate and the added perturbations.Comment: Submitted to ICASSP 2020, Implementation
https://github.com/fiona-lxd/AdvKn
Detecting Anomalous Inputs to DNN Classifiers By Joint Statistical Testing at the Layers
Detecting anomalous inputs, such as adversarial and out-of-distribution (OOD)
inputs, is critical for classifiers deployed in real-world applications,
especially deep neural network (DNN) classifiers that are known to be brittle
on such inputs. We propose an unsupervised statistical testing framework for
detecting such anomalous inputs to a trained DNN classifier based on its
internal layer representations. By calculating test statistics at the input and
intermediate-layer representations of the DNN, conditioned individually on the
predicted class and on the true class of labeled training data, the method
characterizes their class-conditional distributions on natural inputs. Given a
test input, its extent of non-conformity with respect to the training
distribution is captured using p-values of the class-conditional test
statistics across the layers, which are then combined using a scoring function
designed to score high on anomalous inputs. We focus on adversarial inputs,
which are an important class of anomalous inputs, and also demonstrate the
effectiveness of our method on general OOD inputs. The proposed framework also
provides an alternative class prediction that can be used to correct the DNNs
prediction on (detected) adversarial inputs. Experiments on well-known image
classification datasets with strong adversarial attacks, including a custom
attack method that uses the internal layer representations of the DNN,
demonstrate that our method outperforms or performs comparably with five
state-of-the-art detection methods.Comment: 32 pages, 13 figure
Data Driven Exploratory Attacks on Black Box Classifiers in Adversarial Domains
While modern day web applications aim to create impact at the civilization
level, they have become vulnerable to adversarial activity, where the next
cyber-attack can take any shape and can originate from anywhere. The increasing
scale and sophistication of attacks, has prompted the need for a data driven
solution, with machine learning forming the core of many cybersecurity systems.
Machine learning was not designed with security in mind, and the essential
assumption of stationarity, requiring that the training and testing data follow
similar distributions, is violated in an adversarial domain. In this paper, an
adversary's view point of a classification based system, is presented. Based on
a formal adversarial model, the Seed-Explore-Exploit framework is presented,
for simulating the generation of data driven and reverse engineering attacks on
classifiers. Experimental evaluation, on 10 real world datasets and using the
Google Cloud Prediction Platform, demonstrates the innate vulnerability of
classifiers and the ease with which evasion can be carried out, without any
explicit information about the classifier type, the training data or the
application domain. The proposed framework, algorithms and empirical
evaluation, serve as a white hat analysis of the vulnerabilities, and aim to
foster the development of secure machine learning frameworks
Semi-supervised Deep Kernel Learning: Regression with Unlabeled Data by Minimizing Predictive Variance
Large amounts of labeled data are typically required to train deep learning
models. For many real-world problems, however, acquiring additional data can be
expensive or even impossible. We present semi-supervised deep kernel learning
(SSDKL), a semi-supervised regression model based on minimizing predictive
variance in the posterior regularization framework. SSDKL combines the
hierarchical representation learning of neural networks with the probabilistic
modeling capabilities of Gaussian processes. By leveraging unlabeled data, we
show improvements on a diverse set of real-world regression tasks over
supervised deep kernel learning and semi-supervised methods such as VAT and
mean teacher adapted for regression.Comment: In Proceedings of Neural Information Processing Systems (NeurIPS)
201
Detection of Face Recognition Adversarial Attacks
Deep Learning methods have become state-of-the-art for solving tasks such as
Face Recognition (FR). Unfortunately, despite their success, it has been
pointed out that these learning models are exposed to adversarial inputs -
images to which an imperceptible amount of noise for humans is added to
maliciously fool a neural network - thus limiting their adoption in real-world
applications. While it is true that an enormous effort has been spent in order
to train robust models against this type of threat, adversarial detection
techniques have recently started to draw attention within the scientific
community. A detection approach has the advantage that it does not require to
re-train any model, thus it can be added on top of any system. In this context,
we present our work on adversarial samples detection in forensics mainly
focused on detecting attacks against FR systems in which the learning model is
typically used only as a features extractor. Thus, in these cases, train a more
robust classifier might not be enough to defence a FR system. In this frame,
the contribution of our work is four-fold: i) we tested our recently proposed
adversarial detection approach against classifier attacks, i.e. adversarial
samples crafted to fool a FR neural network acting as a classifier; ii) using a
k-Nearest Neighbor (kNN) algorithm as a guidance, we generated deep features
attacks against a FR system based on a DL model acting as features extractor,
followed by a kNN which gives back the query identity based on features
similarity; iii) we used the deep features attacks to fool a FR system on the
1:1 Face Verification task and we showed their superior effectiveness with
respect to classifier attacks in fooling such type of system; iv) we used the
detectors trained on classifier attacks to detect deep features attacks, thus
showing that such approach is generalizable to different types of offensives
Generalization through Memorization: Nearest Neighbor Language Models
We introduce NN-LMs, which extend a pre-trained neural language model (LM)
by linearly interpolating it with a -nearest neighbors (NN) model. The
nearest neighbors are computed according to distance in the pre-trained LM
embedding space, and can be drawn from any text collection, including the
original LM training data. Applying this augmentation to a strong Wikitext-103
LM, with neighbors drawn from the original training set, our NN-LM achieves
a new state-of-the-art perplexity of 15.79 - a 2.9 point improvement with no
additional training. We also show that this approach has implications for
efficiently scaling up to larger training sets and allows for effective domain
adaptation, by simply varying the nearest neighbor datastore, again without
further training. Qualitatively, the model is particularly helpful in
predicting rare patterns, such as factual knowledge. Together, these results
strongly suggest that learning similarity between sequences of text is easier
than predicting the next word, and that nearest neighbor search is an effective
approach for language modeling in the long tail.Comment: ICLR 202
SPADE: A Spectral Method for Black-Box Adversarial Robustness Evaluation
A black-box spectral method is introduced for evaluating the adversarial
robustness of a given machine learning (ML) model. Our approach, named SPADE,
exploits bijective distance mapping between the input/output graphs constructed
for approximating the manifolds corresponding to the input/output data. By
leveraging the generalized Courant-Fischer theorem, we propose a SPADE score
for evaluating the adversarial robustness of a given model, which is proved to
be an upper bound of the best Lipschitz constant under the manifold setting. To
reveal the most non-robust data samples highly vulnerable to adversarial
attacks, we develop a spectral graph embedding procedure leveraging dominant
generalized eigenvectors. This embedding step allows assigning each data sample
a robustness score that can be further harnessed for more effective adversarial
training. Our experiments show the proposed SPADE method leads to promising
empirical results for neural network models that are adversarially trained with
the MNIST and CIFAR-10 data sets.Comment: The 2021 International Conference on Machine Learning (ICML
Do not trust the neighbors! Adversarial Metric Learning for Self-Supervised Scene Flow Estimation
Scene flow is the task of estimating 3D motion vectors to individual points
of a dynamic 3D scene. Motion vectors have shown to be beneficial for
downstream tasks such as action classification and collision avoidance.
However, data collected via LiDAR sensors and stereo cameras are computation
and labor intensive to precisely annotate for scene flow. We address this
annotation bottleneck on two ends. We propose a 3D scene flow benchmark and a
novel self-supervised setup for training flow models. The benchmark consists of
datasets designed to study individual aspects of flow estimation in progressive
order of complexity, from a single object in motion to real-world scenes.
Furthermore, we introduce Adversarial Metric Learning for self-supervised flow
estimation. The flow model is fed with sequences of point clouds to perform
flow estimation. A second model learns a latent metric to distinguish between
the points translated by the flow estimations and the target point cloud. This
latent metric is learned via a Multi-Scale Triplet loss, which uses
intermediary feature vectors for the loss calculation. We use our proposed
benchmark to draw insights about the performance of the baselines and of
different models when trained using our setup. We find that our setup is able
to keep motion coherence and preserve local geometries, which many
self-supervised baselines fail to grasp. Dealing with occlusions, on the other
hand, is still an open challenge.Comment: Master Thesi
- …