23 research outputs found
FLARE: Fingerprinting Deep Reinforcement Learning Agents using Universal Adversarial Masks
We propose FLARE, the first fingerprinting mechanism to verify whether a
suspected Deep Reinforcement Learning (DRL) policy is an illegitimate copy of
another (victim) policy. We first show that it is possible to find
non-transferable, universal adversarial masks, i.e., perturbations, to generate
adversarial examples that can successfully transfer from a victim policy to its
modified versions but not to independently trained policies. FLARE employs
these masks as fingerprints to verify the true ownership of stolen DRL policies
by measuring an action agreement value over states perturbed via such masks.
Our empirical evaluations show that FLARE is effective (100% action agreement
on stolen copies) and does not falsely accuse independent policies (no false
positives). FLARE is also robust to model modification attacks and cannot be
easily evaded by more informed adversaries without negatively impacting agent
performance. We also show that not all universal adversarial masks are suitable
candidates for fingerprints due to the inherent characteristics of DRL
policies. The spatio-temporal dynamics of DRL problems and sequential
decision-making process make characterizing the decision boundary of DRL
policies more difficult, as well as searching for universal masks that capture
the geometry of it.Comment: Will appear in the proceedings of ACSAC 2023; 13 pages, 5 figures, 7
table
Deep Intellectual Property: A Survey
With the widespread application in industrial manufacturing and commercial
services, well-trained deep neural networks (DNNs) are becoming increasingly
valuable and crucial assets due to the tremendous training cost and excellent
generalization performance. These trained models can be utilized by users
without much expert knowledge benefiting from the emerging ''Machine Learning
as a Service'' (MLaaS) paradigm. However, this paradigm also exposes the
expensive models to various potential threats like model stealing and abuse. As
an urgent requirement to defend against these threats, Deep Intellectual
Property (DeepIP), to protect private training data, painstakingly-tuned
hyperparameters, or costly learned model weights, has been the consensus of
both industry and academia. To this end, numerous approaches have been proposed
to achieve this goal in recent years, especially to prevent or discover model
stealing and unauthorized redistribution. Given this period of rapid evolution,
the goal of this paper is to provide a comprehensive survey of the recent
achievements in this field. More than 190 research contributions are included
in this survey, covering many aspects of Deep IP Protection:
challenges/threats, invasive solutions (watermarking), non-invasive solutions
(fingerprinting), evaluation metrics, and performance. We finish the survey by
identifying promising directions for future research.Comment: 38 pages, 12 figure
Are You Stealing My Model? Sample Correlation for Fingerprinting Deep Neural Networks
An off-the-shelf model as a commercial service could be stolen by model
stealing attacks, posing great threats to the rights of the model owner. Model
fingerprinting aims to verify whether a suspect model is stolen from the victim
model, which gains more and more attention nowadays. Previous methods always
leverage the transferable adversarial examples as the model fingerprint, which
is sensitive to adversarial defense or transfer learning scenarios. To address
this issue, we consider the pairwise relationship between samples instead and
propose a novel yet simple model stealing detection method based on SAmple
Correlation (SAC). Specifically, we present SAC-w that selects wrongly
classified normal samples as model inputs and calculates the mean correlation
among their model outputs. To reduce the training time, we further develop
SAC-m that selects CutMix Augmented samples as model inputs, without the need
for training the surrogate models or generating adversarial examples. Extensive
results validate that SAC successfully defends against various model stealing
attacks, even including adversarial training or transfer learning, and detects
the stolen models with the best performance in terms of AUC across different
datasets and model architectures. The codes are available at
https://github.com/guanjiyang/SAC
How to choose your best allies for a transferable attack?
The transferability of adversarial examples is a key issue in the security of
deep neural networks. The possibility of an adversarial example crafted for a
source model fooling another targeted model makes the threat of adversarial
attacks more realistic. Measuring transferability is a crucial problem, but the
Attack Success Rate alone does not provide a sound evaluation. This paper
proposes a new methodology for evaluating transferability by putting distortion
in a central position. This new tool shows that transferable attacks may
perform far worse than a black box attack if the attacker randomly picks the
source model. To address this issue, we propose a new selection mechanism,
called FiT, which aims at choosing the best source model with only a few
preliminary queries to the target. Our experimental results show that FiT is
highly effective at selecting the best source model for multiple scenarios such
as single-model attacks, ensemble-model attacks and multiple attacks (Code
available at: https://github.com/t-maho/transferability_measure_fit)
A Stealthy and Robust Fingerprinting Scheme for Generative Models
This paper presents a novel fingerprinting methodology for the Intellectual
Property protection of generative models. Prior solutions for discriminative
models usually adopt adversarial examples as the fingerprints, which give
anomalous inference behaviors and prediction results. Hence, these methods are
not stealthy and can be easily recognized by the adversary. Our approach
leverages the invisible backdoor technique to overcome the above limitation.
Specifically, we design verification samples, whose model outputs look normal
but can trigger a backdoor classifier to make abnormal predictions. We propose
a new backdoor embedding approach with Unique-Triplet Loss and fine-grained
categorization to enhance the effectiveness of our fingerprints. Extensive
evaluations show that this solution can outperform other strategies with higher
robustness, uniqueness and stealthiness for various GAN models
GrOVe: Ownership Verification of Graph Neural Networks using Embeddings
Graph neural networks (GNNs) have emerged as a state-of-the-art approach to
model and draw inferences from large scale graph-structured data in various
application settings such as social networking. The primary goal of a GNN is to
learn an embedding for each graph node in a dataset that encodes both the node
features and the local graph structure around the node. Embeddings generated by
a GNN for a graph node are unique to that GNN. Prior work has shown that GNNs
are prone to model extraction attacks. Model extraction attacks and defenses
have been explored extensively in other non-graph settings. While detecting or
preventing model extraction appears to be difficult, deterring them via
effective ownership verification techniques offer a potential defense. In
non-graph settings, fingerprinting models, or the data used to build them, have
shown to be a promising approach toward ownership verification. We present
GrOVe, a state-of-the-art GNN model fingerprinting scheme that, given a target
model and a suspect model, can reliably determine if the suspect model was
trained independently of the target model or if it is a surrogate of the target
model obtained via model extraction. We show that GrOVe can distinguish between
surrogate and independent models even when the independent model uses the same
training dataset and architecture as the original target model. Using six
benchmark datasets and three model architectures, we show that consistently
achieves low false-positive and false-negative rates. We demonstrate that is
robust against known fingerprint evasion techniques while remaining
computationally efficient.Comment: 11 pages, 5 figure
NaturalFinger: Generating Natural Fingerprint with Generative Adversarial Networks
Deep neural network (DNN) models have become a critical asset of the model
owner as training them requires a large amount of resource (i.e. labeled data).
Therefore, many fingerprinting schemes have been proposed to safeguard the
intellectual property (IP) of the model owner against model extraction and
illegal redistribution. However, previous schemes adopt unnatural images as the
fingerprint, such as adversarial examples and noisy images, which can be easily
perceived and rejected by the adversary. In this paper, we propose
NaturalFinger which generates natural fingerprint with generative adversarial
networks (GANs). Besides, our proposed NaturalFinger fingerprints the decision
difference areas rather than the decision boundary, which is more robust. The
application of GAN not only allows us to generate more imperceptible samples,
but also enables us to generate unrestricted samples to explore the decision
boundary.To demonstrate the effectiveness of our fingerprint approach, we
evaluate our approach against four model modification attacks including
adversarial training and two model extraction attacks. Experiments show that
our approach achieves 0.91 ARUC value on the FingerBench dataset (154 models),
exceeding the optimal baseline (MetaV) over 17\%
Security and Ownership Verification in Deep Reinforcement Learning
Deep reinforcement learning (DRL) has seen many successes in complex tasks such as robot manipulation, autonomous driving, and competitive games. However, there are few studies on the security threats against DRL systems. In this thesis, we focus on two security concerns in DRL.
The first security concern is adversarial perturbation attacks against DRL agents.
% Adversarial perturbation attacks mislead DRL agents into taking sub-optimal actions by applying a small imperceptible perturbation to the states of the environment.
Adversarial perturbation attacks mislead DRL agents into taking sub-optimal actions. These attacks apply small imperceptible perturbations to the agent's observations of the environment.
Prior work shows that DRL agents are vulnerable to adversarial perturbation attacks.
However, prior attacks are difficult to deploy in real-time settings.
We show that universal adversarial perturbations (UAPs) are effective in reducing a DRL agent's performance in their tasks and are fast enough to be mounted in real-time. We propose three variants of UAPs.
We evaluate the effectiveness of UAPs against different DRL agents (DQN, A2C, and PPO) in three different Atari 2600 games (Pong, Freeway, and Breakout).
We show that UAPs can degrade agent performance by 100\%, in some cases even for a perturbation bound as small as .
We also propose a technique for detecting adversarial perturbation attacks. An effective detection technique can be used in DRL tasks with potentially negative outcomes (such as the agents failing in a task or accumulating negative rewards) by suspending the task before the negative result manifests due to adversarial perturbation attacks. Our experiments found that this detection method works best for Pong with perfect precision and recall against all adversarial perturbation attacks but is less robust for Breakout and Freeway.
The second security concern is theft and unauthorized distribution of DRL agents.
As DRL agents gain success in complex tasks, there is a growing interest to monetize them. However, the possibility of theft could jeopardize the profitability of deploying these agents. Robust ownership verification techniques can deter malicious parties from stealing these agents, and in the event where theft cannot be prevented, ownership verification techniques can be used to track down and prosecute perpetrators.
There are two prior works on ownership verification of DRL agents using watermarks. However, these two techniques require the verifier to deploy the suspected stolen agent in an environment where the verifier has complete control over the environment states. We propose a new fingerprint technique where the verifier compares the percentage of action agreement between the suspect agent and the owner's agent in environments where UAPs are applied.
Our experimental results show that there is a significant difference in the percentage of action agreement (up to in some cases) when the suspect agent is a copy of the owner's agent versus when the suspect agent is an independently trained agent