181 research outputs found
Intellectual Property Protection for Deep Learning Models: Taxonomy, Methods, Attacks, and Evaluations
The training and creation of deep learning model is usually costly, thus it
can be regarded as an intellectual property (IP) of the model creator. However,
malicious users who obtain high-performance models may illegally copy,
redistribute, or abuse the models without permission. To deal with such
security threats, a few deep neural networks (DNN) IP protection methods have
been proposed in recent years. This paper attempts to provide a review of the
existing DNN IP protection works and also an outlook. First, we propose the
first taxonomy for DNN IP protection methods in terms of six attributes:
scenario, mechanism, capacity, type, function, and target models. Then, we
present a survey on existing DNN IP protection works in terms of the above six
attributes, especially focusing on the challenges these methods face, whether
these methods can provide proactive protection, and their resistances to
different levels of attacks. After that, we analyze the potential attacks on
DNN IP protection methods from the aspects of model modifications, evasion
attacks, and active attacks. Besides, a systematic evaluation method for DNN IP
protection methods with respect to basic functional metrics, attack-resistance
metrics, and customized metrics for different application scenarios is given.
Lastly, future research opportunities and challenges on DNN IP protection are
presented
Robust Backdoor Attacks against Deep Neural Networks in Real Physical World
Deep neural networks (DNN) have been widely deployed in various applications.
However, many researches indicated that DNN is vulnerable to backdoor attacks.
The attacker can create a hidden backdoor in target DNN model, and trigger the
malicious behaviors by submitting specific backdoor instance. However, almost
all the existing backdoor works focused on the digital domain, while few
studies investigate the backdoor attacks in real physical world. Restricted to
a variety of physical constraints, the performance of backdoor attacks in the
real physical world will be severely degraded. In this paper, we propose a
robust physical backdoor attack method, PTB (physical transformations for
backdoors), to implement the backdoor attacks against deep learning models in
the real physical world. Specifically, in the training phase, we perform a
series of physical transformations on these injected backdoor instances at each
round of model training, so as to simulate various transformations that a
backdoor may experience in real world, thus improves its physical robustness.
Experimental results on the state-of-the-art face recognition model show that,
compared with the backdoor methods that without PTB, the proposed attack method
can significantly improve the performance of backdoor attacks in real physical
world. Under various complex physical conditions, by injecting only a very
small ratio (0.5%) of backdoor instances, the attack success rate of physical
backdoor attacks with the PTB method on VGGFace is 82%, while the attack
success rate of backdoor attacks without the proposed PTB method is lower than
11%. Meanwhile, the normal performance of the target DNN model has not been
affected
Use the Spear as a Shield: A Novel Adversarial Example based Privacy-Preserving Technique against Membership Inference Attacks
Recently, the membership inference attack poses a serious threat to the
privacy of confidential training data of machine learning models. This paper
proposes a novel adversarial example based privacy-preserving technique
(AEPPT), which adds the crafted adversarial perturbations to the prediction of
the target model to mislead the adversary's membership inference model. The
added adversarial perturbations do not affect the accuracy of target model, but
can prevent the adversary from inferring whether a specific data is in the
training set of the target model. Since AEPPT only modifies the original output
of the target model, the proposed method is general and does not require
modifying or retraining the target model. Experimental results show that the
proposed method can reduce the inference accuracy and precision of the
membership inference model to 50%, which is close to a random guess. Further,
for those adaptive attacks where the adversary knows the defense mechanism, the
proposed AEPPT is also demonstrated to be effective. Compared with the
state-of-the-art defense methods, the proposed defense can significantly
degrade the accuracy and precision of membership inference attacks to 50%
(i.e., the same as a random guess) while the performance and utility of the
target model will not be affected
Detect and remove watermark in deep neural networks via generative adversarial networks
Deep neural networks (DNN) have achieved remarkable performance in various
fields. However, training a DNN model from scratch requires a lot of computing
resources and training data. It is difficult for most individual users to
obtain such computing resources and training data. Model copyright infringement
is an emerging problem in recent years. For instance, pre-trained models may be
stolen or abuse by illegal users without the authorization of the model owner.
Recently, many works on protecting the intellectual property of DNN models have
been proposed. In these works, embedding watermarks into DNN based on backdoor
is one of the widely used methods. However, when the DNN model is stolen, the
backdoor-based watermark may face the risk of being detected and removed by an
adversary. In this paper, we propose a scheme to detect and remove watermark in
deep neural networks via generative adversarial networks (GAN). We demonstrate
that the backdoor-based DNN watermarks are vulnerable to the proposed GAN-based
watermark removal attack. The proposed attack method includes two phases. In
the first phase, we use the GAN and few clean images to detect and reverse the
watermark in the DNN model. In the second phase, we fine-tune the watermarked
DNN based on the reversed backdoor images. Experimental evaluations on the
MNIST and CIFAR10 datasets demonstrate that, the proposed method can
effectively remove about 98% of the watermark in DNN models, as the watermark
retention rate reduces from 100% to less than 2% after applying the proposed
attack. In the meantime, the proposed attack hardly affects the model's
performance. The test accuracy of the watermarked DNN on the MNIST and the
CIFAR10 datasets drops by less than 1% and 3%, respectively
SocialGuard: An Adversarial Example Based Privacy-Preserving Technique for Social Images
The popularity of various social platforms has prompted more people to share
their routine photos online. However, undesirable privacy leakages occur due to
such online photo sharing behaviors. Advanced deep neural network (DNN) based
object detectors can easily steal users' personal information exposed in shared
photos. In this paper, we propose a novel adversarial example based
privacy-preserving technique for social images against object detectors based
privacy stealing. Specifically, we develop an Object Disappearance Algorithm to
craft two kinds of adversarial social images. One can hide all objects in the
social images from being detected by an object detector, and the other can make
the customized sensitive objects be incorrectly classified by the object
detector. The Object Disappearance Algorithm constructs perturbation on a clean
social image. After being injected with the perturbation, the social image can
easily fool the object detector, while its visual quality will not be degraded.
We use two metrics, privacy-preserving success rate and privacy leakage rate,
to evaluate the effectiveness of the proposed method. Experimental results show
that, the proposed method can effectively protect the privacy of social images.
The privacy-preserving success rates of the proposed method on MS-COCO and
PASCAL VOC 2007 datasets are high up to 96.1% and 99.3%, respectively, and the
privacy leakage rates on these two datasets are as low as 0.57% and 0.07%,
respectively. In addition, compared with existing image processing methods (low
brightness, noise, blur, mosaic and JPEG compression), the proposed method can
achieve much better performance in privacy protection and image visual quality
maintenance
ActiveGuard: An Active DNN IP Protection Technique via Adversarial Examples
The training of Deep Neural Networks (DNN) is costly, thus DNN can be
considered as the intellectual properties (IP) of model owners. To date, most
of the existing protection works focus on verifying the ownership after the DNN
model is stolen, which cannot resist piracy in advance. To this end, we propose
an active DNN IP protection method based on adversarial examples against DNN
piracy, named ActiveGuard. ActiveGuard aims to achieve authorization control
and users' fingerprints management through adversarial examples, and can
provide ownership verification. Specifically, ActiveGuard exploits the
elaborate adversarial examples as users' fingerprints to distinguish authorized
users from unauthorized users. Legitimate users can enter fingerprints into DNN
for identity authentication and authorized usage, while unauthorized users will
obtain poor model performance due to an additional control layer. In addition,
ActiveGuard enables the model owner to embed a watermark into the weights of
DNN. When the DNN is illegally pirated, the model owner can extract the
embedded watermark and perform ownership verification. Experimental results
show that, for authorized users, the test accuracy of LeNet-5 and Wide Residual
Network (WRN) models are 99.15% and 91.46%, respectively, while for
unauthorized users, the test accuracy of the two DNNs are only 8.92% (LeNet-5)
and 10% (WRN), respectively. Besides, each authorized user can pass the
fingerprint authentication with a high success rate (up to 100%). For ownership
verification, the embedded watermark can be successfully extracted, while the
normal performance of the DNN model will not be affected. Further, ActiveGuard
is demonstrated to be robust against fingerprint forgery attack, model
fine-tuning attack and pruning attack
- …