56 research outputs found
RAB: Provable Robustness Against Backdoor Attacks
Recent studies have shown that deep neural networks (DNNs) are vulnerable to
adversarial attacks, including evasion and backdoor (poisoning) attacks. On the
defense side, there have been intensive efforts on improving both empirical and
provable robustness against evasion attacks; however, provable robustness
against backdoor attacks still remains largely unexplored. In this paper, we
focus on certifying the machine learning model robustness against general
threat models, especially backdoor attacks. We first provide a unified
framework via randomized smoothing techniques and show how it can be
instantiated to certify the robustness against both evasion and backdoor
attacks. We then propose the first robust training process, RAB, to smooth the
trained model and certify its robustness against backdoor attacks. We derive
the robustness bound for machine learning models trained with RAB, and prove
that our robustness bound is tight. In addition, we show that it is possible to
train the robust smoothed models efficiently for simple models such as
K-nearest neighbor classifiers, and we propose an exact smooth-training
algorithm which eliminates the need to sample from a noise distribution for
such models. Empirically, we conduct comprehensive experiments for different
machine learning (ML) models such as DNNs, differentially private DNNs, and
K-NN models on MNIST, CIFAR-10 and ImageNet datasets, and provide the first
benchmark for certified robustness against backdoor attacks. In addition, we
evaluate K-NN models on a spambase tabular dataset to demonstrate the
advantages of the proposed exact algorithm. Both the theoretic analysis and the
comprehensive evaluation on diverse ML models and datasets shed lights on
further robust learning strategies against general training time attacks.Comment: 31 pages, 5 figures, 7 table
Certifiers make neural networks vulnerable to availability attacks
To achieve reliable, robust, and safe AI systems, it is vital to implement fallback strategies when AI predictions cannot be trusted. Certifiers for neural networks are a reliable way to check the robustness of these predictions. They guarantee for some predictions that a certain class of manipulations or attacks could not have changed the outcome. For the remaining predictions without guarantees, the method abstains from making a prediction, and a fallback strategy needs to be invoked, which typically incurs additional costs, can require a human operator, or even fail to provide any prediction. While this is a key concept towards safe and secure AI, we show for the first time that this approach comes with its own security risks, as such fallback strategies can be deliberately triggered by an adversary. In addition to naturally occurring abstains for some inputs and perturbations, the adversary can use training-time attacks to deliberately trigger the fallback with high probability. This transfers the main system load onto the fallback, reducing the overall system’s integrity and/or availability. We design two novel availability attacks which show the practical relevance of these threats. For example, adding 1% poisoned data during training is sufficient to trigger the fallback and hence make the model unavailable for up to 100% of all inputs by inserting the trigger. Our extensive experiments across multiple datasets, model architectures, and certifiers demonstrate the broad applicability of these attacks. A first investigation into potential defenses shows that current approaches are insufficient to mitigate the issue, highlighting the need for new, specific solutions
PECAN: A Deterministic Certified Defense Against Backdoor Attacks
Neural networks are vulnerable to backdoor poisoning attacks, where the
attackers maliciously poison the training set and insert triggers into the test
input to change the prediction of the victim model. Existing defenses for
backdoor attacks either provide no formal guarantees or come with
expensive-to-compute and ineffective probabilistic guarantees. We present
PECAN, an efficient and certified approach for defending against backdoor
attacks. The key insight powering PECAN is to apply off-the-shelf test-time
evasion certification techniques on a set of neural networks trained on
disjoint partitions of the data. We evaluate PECAN on image classification and
malware detection datasets. Our results demonstrate that PECAN can (1)
significantly outperform the state-of-the-art certified backdoor defense, both
in defense strength and efficiency, and (2) on real back-door attacks, PECAN
can reduce attack success rate by order of magnitude when compared to a range
of baselines from the literature
Node-aware Bi-smoothing: Certified Robustness against Graph Injection Attacks
Deep Graph Learning (DGL) has emerged as a crucial technique across various
domains. However, recent studies have exposed vulnerabilities in DGL models,
such as susceptibility to evasion and poisoning attacks. While empirical and
provable robustness techniques have been developed to defend against graph
modification attacks (GMAs), the problem of certified robustness against graph
injection attacks (GIAs) remains largely unexplored. To bridge this gap, we
introduce the node-aware bi-smoothing framework, which is the first certifiably
robust approach for general node classification tasks against GIAs. Notably,
the proposed node-aware bi-smoothing scheme is model-agnostic and is applicable
for both evasion and poisoning attacks. Through rigorous theoretical analysis,
we establish the certifiable conditions of our smoothing scheme. We also
explore the practical implications of our node-aware bi-smoothing schemes in
two contexts: as an empirical defense approach against real-world GIAs and in
the context of recommendation systems. Furthermore, we extend two
state-of-the-art certified robustness frameworks to address node injection
attacks and compare our approach against them. Extensive evaluations
demonstrate the effectiveness of our proposed certificates
Certified Robustness of Nearest Neighbors against Data Poisoning and Backdoor Attacks
Data poisoning attacks and backdoor attacks aim to corrupt a machine learning
classifier via modifying, adding, and/or removing some carefully selected
training examples, such that the corrupted classifier makes incorrect
predictions as the attacker desires. The key idea of state-of-the-art certified
defenses against data poisoning attacks and backdoor attacks is to create a
majority vote mechanism to predict the label of a testing example. Moreover,
each voter is a base classifier trained on a subset of the training dataset.
Classical simple learning algorithms such as k nearest neighbors (kNN) and
radius nearest neighbors (rNN) have intrinsic majority vote mechanisms. In this
work, we show that the intrinsic majority vote mechanisms in kNN and rNN
already provide certified robustness guarantees against data poisoning attacks
and backdoor attacks. Moreover, our evaluation results on MNIST and CIFAR10
show that the intrinsic certified robustness guarantees of kNN and rNN
outperform those provided by state-of-the-art certified defenses. Our results
serve as standard baselines for future certified defenses against data
poisoning attacks and backdoor attacks.Comment: To appear in AAAI Conference on Artificial Intelligence, 202
XRand: Differentially Private Defense against Explanation-Guided Attacks
Recent development in the field of explainable artificial intelligence (XAI)
has helped improve trust in Machine-Learning-as-a-Service (MLaaS) systems, in
which an explanation is provided together with the model prediction in response
to each query. However, XAI also opens a door for adversaries to gain insights
into the black-box models in MLaaS, thereby making the models more vulnerable
to several attacks. For example, feature-based explanations (e.g., SHAP) could
expose the top important features that a black-box model focuses on. Such
disclosure has been exploited to craft effective backdoor triggers against
malware classifiers. To address this trade-off, we introduce a new concept of
achieving local differential privacy (LDP) in the explanations, and from that
we establish a defense, called XRand, against such attacks. We show that our
mechanism restricts the information that the adversary can learn about the top
important features, while maintaining the faithfulness of the explanations.Comment: To be published at AAAI 202
- …