61 research outputs found
Isolation and Induction: Training Robust Deep Neural Networks against Model Stealing Attacks
Despite the broad application of Machine Learning models as a Service
(MLaaS), they are vulnerable to model stealing attacks. These attacks can
replicate the model functionality by using the black-box query process without
any prior knowledge of the target victim model. Existing stealing defenses add
deceptive perturbations to the victim's posterior probabilities to mislead the
attackers. However, these defenses are now suffering problems of high inference
computational overheads and unfavorable trade-offs between benign accuracy and
stealing robustness, which challenges the feasibility of deployed models in
practice. To address the problems, this paper proposes Isolation and Induction
(InI), a novel and effective training framework for model stealing defenses.
Instead of deploying auxiliary defense modules that introduce redundant
inference time, InI directly trains a defensive model by isolating the
adversary's training gradient from the expected gradient, which can effectively
reduce the inference computational cost. In contrast to adding perturbations
over model predictions that harm the benign accuracy, we train models to
produce uninformative outputs against stealing queries, which can induce the
adversary to extract little useful knowledge from victim models with minimal
impact on the benign performance. Extensive experiments on several visual
classification datasets (e.g., MNIST and CIFAR10) demonstrate the superior
robustness (up to 48% reduction on stealing accuracy) and speed (up to 25.4x
faster) of our InI over other state-of-the-art methods. Our codes can be found
in https://github.com/DIG-Beihang/InI-Model-Stealing-Defense.Comment: Accepted by ACM Multimedia 202
Perturbing Inputs to Prevent Model Stealing
We show how perturbing inputs to machine learning services (ML-service)
deployed in the cloud can protect against model stealing attacks. In our
formulation, there is an ML-service that receives inputs from users and returns
the output of the model. There is an attacker that is interested in learning
the parameters of the ML-service. We use the linear and logistic regression
models to illustrate how strategically adding noise to the inputs fundamentally
alters the attacker's estimation problem. We show that even with infinite
samples, the attacker would not be able to recover the true model parameters.
We focus on characterizing the trade-off between the error in the attacker's
estimate of the parameters with the error in the ML-service's output
Efficient Defense Against Model Stealing Attacks on Convolutional Neural Networks
Model stealing attacks have become a serious concern for deep learning
models, where an attacker can steal a trained model by querying its black-box
API. This can lead to intellectual property theft and other security and
privacy risks. The current state-of-the-art defenses against model stealing
attacks suggest adding perturbations to the prediction probabilities. However,
they suffer from heavy computations and make impracticable assumptions about
the adversary. They often require the training of auxiliary models. This can be
time-consuming and resource-intensive which hinders the deployment of these
defenses in real-world applications. In this paper, we propose a simple yet
effective and efficient defense alternative. We introduce a heuristic approach
to perturb the output probabilities. The proposed defense can be easily
integrated into models without additional training. We show that our defense is
effective in defending against three state-of-the-art stealing attacks. We
evaluate our approach on large and quantized (i.e., compressed) Convolutional
Neural Networks (CNNs) trained on several vision datasets. Our technique
outperforms the state-of-the-art defenses with a faster inference
latency without requiring any additional model and with a low impact on the
model's performance. We validate that our defense is also effective for
quantized CNNs targeting edge devices.Comment: Accepted for publication at 2023 International Conference on Machine
Learning and Applications (ICMLA
Privacy Risks of Securing Machine Learning Models against Adversarial Examples
The arms race between attacks and defenses for machine learning models has
come to a forefront in recent years, in both the security community and the
privacy community. However, one big limitation of previous research is that the
security domain and the privacy domain have typically been considered
separately. It is thus unclear whether the defense methods in one domain will
have any unexpected impact on the other domain.
In this paper, we take a step towards resolving this limitation by combining
the two domains. In particular, we measure the success of membership inference
attacks against six state-of-the-art defense methods that mitigate the risk of
adversarial examples (i.e., evasion attacks). Membership inference attacks
determine whether or not an individual data record has been part of a model's
training set. The accuracy of such attacks reflects the information leakage of
training algorithms about individual members of the training set. Adversarial
defense methods against adversarial examples influence the model's decision
boundaries such that model predictions remain unchanged for a small area around
each input. However, this objective is optimized on training data. Thus,
individual data records in the training set have a significant influence on
robust models. This makes the models more vulnerable to inference attacks.
To perform the membership inference attacks, we leverage the existing
inference methods that exploit model predictions. We also propose two new
inference methods that exploit structural properties of robust models on
adversarially perturbed data. Our experimental evaluation demonstrates that
compared with the natural training (undefended) approach, adversarial defense
methods can indeed increase the target model's risk against membership
inference attacks.Comment: ACM CCS 2019, code is available at
https://github.com/inspire-group/privacy-vs-robustnes
FDINet: Protecting against DNN Model Extraction via Feature Distortion Index
Machine Learning as a Service (MLaaS) platforms have gained popularity due to
their accessibility, cost-efficiency, scalability, and rapid development
capabilities. However, recent research has highlighted the vulnerability of
cloud-based models in MLaaS to model extraction attacks. In this paper, we
introduce FDINET, a novel defense mechanism that leverages the feature
distribution of deep neural network (DNN) models. Concretely, by analyzing the
feature distribution from the adversary's queries, we reveal that the feature
distribution of these queries deviates from that of the model's training set.
Based on this key observation, we propose Feature Distortion Index (FDI), a
metric designed to quantitatively measure the feature distribution deviation of
received queries. The proposed FDINET utilizes FDI to train a binary detector
and exploits FDI similarity to identify colluding adversaries from distributed
extraction attacks. We conduct extensive experiments to evaluate FDINET against
six state-of-the-art extraction attacks on four benchmark datasets and four
popular model architectures. Empirical results demonstrate the following
findings FDINET proves to be highly effective in detecting model extraction,
achieving a 100% detection accuracy on DFME and DaST. FDINET is highly
efficient, using just 50 queries to raise an extraction alarm with an average
confidence of 96.08% for GTSRB. FDINET exhibits the capability to identify
colluding adversaries with an accuracy exceeding 91%. Additionally, it
demonstrates the ability to detect two types of adaptive attacks.Comment: 13 pages, 7 figure
GrOVe: Ownership Verification of Graph Neural Networks using Embeddings
Graph neural networks (GNNs) have emerged as a state-of-the-art approach to
model and draw inferences from large scale graph-structured data in various
application settings such as social networking. The primary goal of a GNN is to
learn an embedding for each graph node in a dataset that encodes both the node
features and the local graph structure around the node. Embeddings generated by
a GNN for a graph node are unique to that GNN. Prior work has shown that GNNs
are prone to model extraction attacks. Model extraction attacks and defenses
have been explored extensively in other non-graph settings. While detecting or
preventing model extraction appears to be difficult, deterring them via
effective ownership verification techniques offer a potential defense. In
non-graph settings, fingerprinting models, or the data used to build them, have
shown to be a promising approach toward ownership verification. We present
GrOVe, a state-of-the-art GNN model fingerprinting scheme that, given a target
model and a suspect model, can reliably determine if the suspect model was
trained independently of the target model or if it is a surrogate of the target
model obtained via model extraction. We show that GrOVe can distinguish between
surrogate and independent models even when the independent model uses the same
training dataset and architecture as the original target model. Using six
benchmark datasets and three model architectures, we show that consistently
achieves low false-positive and false-negative rates. We demonstrate that is
robust against known fingerprint evasion techniques while remaining
computationally efficient.Comment: 11 pages, 5 figure
MAZE: Data-Free Model Stealing Attack Using Zeroth-Order Gradient Estimation
Model Stealing (MS) attacks allow an adversary with black-box access to a
Machine Learning model to replicate its functionality, compromising the
confidentiality of the model. Such attacks train a clone model by using the
predictions of the target model for different inputs. The effectiveness of such
attacks relies heavily on the availability of data necessary to query the
target model. Existing attacks either assume partial access to the dataset of
the target model or availability of an alternate dataset with semantic
similarities.
This paper proposes MAZE -- a data-free model stealing attack using
zeroth-order gradient estimation. In contrast to prior works, MAZE does not
require any data and instead creates synthetic data using a generative model.
Inspired by recent works in data-free Knowledge Distillation (KD), we train the
generative model using a disagreement objective to produce inputs that maximize
disagreement between the clone and the target model. However, unlike the
white-box setting of KD, where the gradient information is available, training
a generator for model stealing requires performing black-box optimization, as
it involves accessing the target model under attack. MAZE relies on
zeroth-order gradient estimation to perform this optimization and enables a
highly accurate MS attack.
Our evaluation with four datasets shows that MAZE provides a normalized clone
accuracy in the range of 0.91x to 0.99x, and outperforms even the recent
attacks that rely on partial data (JBDA, clone accuracy 0.13x to 0.69x) and
surrogate data (KnockoffNets, clone accuracy 0.52x to 0.97x). We also study an
extension of MAZE in the partial-data setting and develop MAZE-PD, which
generates synthetic data closer to the target distribution. MAZE-PD further
improves the clone accuracy (0.97x to 1.0x) and reduces the query required for
the attack by 2x-24x
- …