65 research outputs found
Robustness Certificates for Sparse Adversarial Attacks by Randomized Ablation
Recently, techniques have been developed to provably guarantee the robustness
of a classifier to adversarial perturbations of bounded L_1 and L_2 magnitudes
by using randomized smoothing: the robust classification is a consensus of base
classifications on randomly noised samples where the noise is additive. In this
paper, we extend this technique to the L_0 threat model. We propose an
efficient and certifiably robust defense against sparse adversarial attacks by
randomly ablating input features, rather than using additive noise.
Experimentally, on MNIST, we can certify the classifications of over 50% of
images to be robust to any distortion of at most 8 pixels. This is comparable
to the observed empirical robustness of unprotected classifiers on MNIST to
modern L_0 attacks, demonstrating the tightness of the proposed robustness
certificate. We also evaluate our certificate on ImageNet and CIFAR-10. Our
certificates represent an improvement on those provided in a concurrent work
(Lee et al. 2019) which uses random noise rather than ablation (median
certificates of 8 pixels versus 4 pixels on MNIST; 16 pixels versus 1 pixel on
ImageNet.) Additionally, we empirically demonstrate that our classifier is
highly robust to modern sparse adversarial attacks on MNIST. Our
classifications are robust, in median, to adversarial perturbations of up to 31
pixels, compared to 22 pixels reported as the state-of-the-art defense, at the
cost of a slight decrease (around 2.3%) in the classification accuracy. Code is
available at https://github.com/alevine0/randomizedAblation/
RS-Del: Edit Distance Robustness Certificates for Sequence Classifiers via Randomized Deletion
Randomized smoothing is a leading approach for constructing classifiers that
are certifiably robust against adversarial examples. Existing work on
randomized smoothing has focused on classifiers with continuous inputs, such as
images, where -norm bounded adversaries are commonly studied. However,
there has been limited work for classifiers with discrete or variable-size
inputs, such as for source code, which require different threat models and
smoothing mechanisms. In this work, we adapt randomized smoothing for discrete
sequence classifiers to provide certified robustness against edit
distance-bounded adversaries. Our proposed smoothing mechanism randomized
deletion (RS-Del) applies random deletion edits, which are (perhaps
surprisingly) sufficient to confer robustness against adversarial deletion,
insertion and substitution edits. Our proof of certification deviates from the
established Neyman-Pearson approach, which is intractable in our setting, and
is instead organized around longest common subsequences. We present a case
study on malware detection--a binary classification problem on byte sequences
where classifier evasion is a well-established threat model. When applied to
the popular MalConv malware detection model, our smoothing mechanism RS-Del
achieves a certified accuracy of 91% at an edit distance radius of 128 bytes.Comment: To be published in NeurIPS 2023. 36 pages, 7 figures, 12 tables.
Includes 20 pages of appendice
Efficient Robustness Certificates for Discrete Data: Sparsity-Aware Randomized Smoothing for Graphs, Images and More
Existing techniques for certifying the robustness of models for discrete data
either work only for a small class of models or are general at the expense of
efficiency or tightness. Moreover, they do not account for sparsity in the
input which, as our findings show, is often essential for obtaining non-trivial
guarantees. We propose a model-agnostic certificate based on the randomized
smoothing framework which subsumes earlier work and is tight, efficient, and
sparsity-aware. Its computational complexity does not depend on the number of
discrete categories or the dimension of the input (e.g. the graph size), making
it highly scalable. We show the effectiveness of our approach on a wide variety
of models, datasets, and tasks -- specifically highlighting its use for Graph
Neural Networks. So far, obtaining provable guarantees for GNNs has been
difficult due to the discrete and non-i.i.d. nature of graph data. Our method
can certify any GNN and handles perturbations to both the graph structure and
the node attributes.Comment: Proceedings of the 37th International Conference on Machine Learning
(ICML 2020
Randomized Message-Interception Smoothing: Gray-box Certificates for Graph Neural Networks
Randomized smoothing is one of the most promising frameworks for certifying the adversarial robustness of machine learning models, including Graph Neural Networks (GNNs). Yet, existing randomized smoothing certificates for GNNs are overly pessimistic since they treat the model as a black box, ignoring the underlying architecture. To remedy this, we propose novel gray-box certificates that exploit the message-passing principle of GNNs: We randomly intercept messages and carefully analyze the probability that messages from adversarially controlled nodes reach their target nodes. Compared to existing certificates, we certify robustness to much stronger adversaries that control entire nodes in the graph and can arbitrarily manipulate node features. Our certificates provide stronger guarantees for attacks at larger distances, as messages from farther-away nodes are more likely to get intercepted. We demonstrate the effectiveness of our method on various models and datasets. Since our gray-box certificates consider the underlying graph structure, we can significantly improve certifiable robustness by applying graph sparsification
Almost Tight L0-norm Certified Robustness of Top-k Predictions against Adversarial Perturbations
Top- predictions are used in many real-world applications such as machine
learning as a service, recommender systems, and web searches. -norm
adversarial perturbation characterizes an attack that arbitrarily modifies some
features of an input such that a classifier makes an incorrect prediction for
the perturbed input. -norm adversarial perturbation is easy to
interpret and can be implemented in the physical world. Therefore, certifying
robustness of top- predictions against -norm adversarial
perturbation is important. However, existing studies either focused on
certifying -norm robustness of top- predictions or -norm
robustness of top- predictions. In this work, we aim to bridge the gap. Our
approach is based on randomized smoothing, which builds a provably robust
classifier from an arbitrary classifier via randomizing an input. Our major
theoretical contribution is an almost tight -norm certified robustness
guarantee for top- predictions. We empirically evaluate our method on
CIFAR10 and ImageNet. For instance, our method can build a classifier that
achieves a certified top-3 accuracy of 69.2\% on ImageNet when an attacker can
arbitrarily perturb 5 pixels of a testing image
- …