18 research outputs found
Extracted BERT Model Leaks More Information than You Think!
The collection and availability of big data, combined with advances in pre-trained models (e.g. BERT), have revolutionized the predictive performance of natural language processing tasks. This allows corporations to provide machine learning as a service (MLaaS) by encapsulating fine-tuned BERT-based models as APIs. Due to significant commercial interest, there has been a surge of attempts to steal remote services via model extraction. Although previous works have made progress in defending against model extraction attacks, there has been little discussion on their performance in preventing privacy leakage. This work bridges this gap by launching an attribute inference attack against the extracted BERT model. Our extensive experiments reveal that model extraction can cause severe privacy leakage even when victim models are facilitated with advanced defensive strategies
Model Stealing Attack against Multi-Exit Networks
Compared to traditional neural networks with a single exit, a multi-exit
network has multiple exits that allow for early output from intermediate layers
of the model, thus bringing significant improvement in computational efficiency
while maintaining similar recognition accuracy. When attempting to steal such
valuable models using traditional model stealing attacks, we found that
conventional methods can only steal the model's classification function while
failing to capture its output strategy. This results in a significant decrease
in computational efficiency for the stolen substitute model, thereby losing the
advantages of multi-exit networks.In this paper, we propose the first model
stealing attack to extract both the model function and output strategy. We
employ bayesian changepoint detection to analyze the target model's output
strategy and use performance loss and strategy loss to guide the training of
the substitute model. Furthermore, we designed a novel output strategy search
algorithm that can find the optimal output strategy to maximize the consistency
between the victim model and the substitute model's outputs. Through
experiments on multiple mainstream multi-exit networks and benchmark datasets,
we thoroughly demonstrates the effectiveness of our method
BERT & Family Eat Word Salad: Experiments with Text Understanding
In this paper, we study the response of large models from the BERT family to
incoherent inputs that should confuse any model that claims to understand
natural language. We define simple heuristics to construct such examples. Our
experiments show that state-of-the-art models consistently fail to recognize
them as ill-formed, and instead produce high confidence predictions on them. As
a consequence of this phenomenon, models trained on sentences with randomly
permuted word order perform close to state-of-the-art models. To alleviate
these issues, we show that if models are explicitly trained to recognize
invalid inputs, they can be robust to such attacks without a drop in
performance.Comment: Accepted at AAAI 2021, Camera Ready Versio
FDINet: Protecting against DNN Model Extraction via Feature Distortion Index
Machine Learning as a Service (MLaaS) platforms have gained popularity due to
their accessibility, cost-efficiency, scalability, and rapid development
capabilities. However, recent research has highlighted the vulnerability of
cloud-based models in MLaaS to model extraction attacks. In this paper, we
introduce FDINET, a novel defense mechanism that leverages the feature
distribution of deep neural network (DNN) models. Concretely, by analyzing the
feature distribution from the adversary's queries, we reveal that the feature
distribution of these queries deviates from that of the model's training set.
Based on this key observation, we propose Feature Distortion Index (FDI), a
metric designed to quantitatively measure the feature distribution deviation of
received queries. The proposed FDINET utilizes FDI to train a binary detector
and exploits FDI similarity to identify colluding adversaries from distributed
extraction attacks. We conduct extensive experiments to evaluate FDINET against
six state-of-the-art extraction attacks on four benchmark datasets and four
popular model architectures. Empirical results demonstrate the following
findings FDINET proves to be highly effective in detecting model extraction,
achieving a 100% detection accuracy on DFME and DaST. FDINET is highly
efficient, using just 50 queries to raise an extraction alarm with an average
confidence of 96.08% for GTSRB. FDINET exhibits the capability to identify
colluding adversaries with an accuracy exceeding 91%. Additionally, it
demonstrates the ability to detect two types of adaptive attacks.Comment: 13 pages, 7 figure
MEA-Defender: A Robust Watermark against Model Extraction Attack
Recently, numerous highly-valuable Deep Neural Networks (DNNs) have been
trained using deep learning algorithms. To protect the Intellectual Property
(IP) of the original owners over such DNN models, backdoor-based watermarks
have been extensively studied. However, most of such watermarks fail upon model
extraction attack, which utilizes input samples to query the target model and
obtains the corresponding outputs, thus training a substitute model using such
input-output pairs. In this paper, we propose a novel watermark to protect IP
of DNN models against model extraction, named MEA-Defender. In particular, we
obtain the watermark by combining two samples from two source classes in the
input domain and design a watermark loss function that makes the output domain
of the watermark within that of the main task samples. Since both the input
domain and the output domain of our watermark are indispensable parts of those
of the main task samples, the watermark will be extracted into the stolen model
along with the main task during model extraction. We conduct extensive
experiments on four model extraction attacks, using five datasets and six
models trained based on supervised learning and self-supervised learning
algorithms. The experimental results demonstrate that MEA-Defender is highly
robust against different model extraction attacks, and various watermark
removal/detection approaches.Comment: To Appear in IEEE Symposium on Security and Privacy 2024 (IEEE S&P
2024), MAY 20-23, 2024, SAN FRANCISCO, CA, US