23 research outputs found

    FLARE: Fingerprinting Deep Reinforcement Learning Agents using Universal Adversarial Masks

    Full text link
    We propose FLARE, the first fingerprinting mechanism to verify whether a suspected Deep Reinforcement Learning (DRL) policy is an illegitimate copy of another (victim) policy. We first show that it is possible to find non-transferable, universal adversarial masks, i.e., perturbations, to generate adversarial examples that can successfully transfer from a victim policy to its modified versions but not to independently trained policies. FLARE employs these masks as fingerprints to verify the true ownership of stolen DRL policies by measuring an action agreement value over states perturbed via such masks. Our empirical evaluations show that FLARE is effective (100% action agreement on stolen copies) and does not falsely accuse independent policies (no false positives). FLARE is also robust to model modification attacks and cannot be easily evaded by more informed adversaries without negatively impacting agent performance. We also show that not all universal adversarial masks are suitable candidates for fingerprints due to the inherent characteristics of DRL policies. The spatio-temporal dynamics of DRL problems and sequential decision-making process make characterizing the decision boundary of DRL policies more difficult, as well as searching for universal masks that capture the geometry of it.Comment: Will appear in the proceedings of ACSAC 2023; 13 pages, 5 figures, 7 table

    Deep Intellectual Property: A Survey

    Full text link
    With the widespread application in industrial manufacturing and commercial services, well-trained deep neural networks (DNNs) are becoming increasingly valuable and crucial assets due to the tremendous training cost and excellent generalization performance. These trained models can be utilized by users without much expert knowledge benefiting from the emerging ''Machine Learning as a Service'' (MLaaS) paradigm. However, this paradigm also exposes the expensive models to various potential threats like model stealing and abuse. As an urgent requirement to defend against these threats, Deep Intellectual Property (DeepIP), to protect private training data, painstakingly-tuned hyperparameters, or costly learned model weights, has been the consensus of both industry and academia. To this end, numerous approaches have been proposed to achieve this goal in recent years, especially to prevent or discover model stealing and unauthorized redistribution. Given this period of rapid evolution, the goal of this paper is to provide a comprehensive survey of the recent achievements in this field. More than 190 research contributions are included in this survey, covering many aspects of Deep IP Protection: challenges/threats, invasive solutions (watermarking), non-invasive solutions (fingerprinting), evaluation metrics, and performance. We finish the survey by identifying promising directions for future research.Comment: 38 pages, 12 figure

    Are You Stealing My Model? Sample Correlation for Fingerprinting Deep Neural Networks

    Full text link
    An off-the-shelf model as a commercial service could be stolen by model stealing attacks, posing great threats to the rights of the model owner. Model fingerprinting aims to verify whether a suspect model is stolen from the victim model, which gains more and more attention nowadays. Previous methods always leverage the transferable adversarial examples as the model fingerprint, which is sensitive to adversarial defense or transfer learning scenarios. To address this issue, we consider the pairwise relationship between samples instead and propose a novel yet simple model stealing detection method based on SAmple Correlation (SAC). Specifically, we present SAC-w that selects wrongly classified normal samples as model inputs and calculates the mean correlation among their model outputs. To reduce the training time, we further develop SAC-m that selects CutMix Augmented samples as model inputs, without the need for training the surrogate models or generating adversarial examples. Extensive results validate that SAC successfully defends against various model stealing attacks, even including adversarial training or transfer learning, and detects the stolen models with the best performance in terms of AUC across different datasets and model architectures. The codes are available at https://github.com/guanjiyang/SAC

    How to choose your best allies for a transferable attack?

    Full text link
    The transferability of adversarial examples is a key issue in the security of deep neural networks. The possibility of an adversarial example crafted for a source model fooling another targeted model makes the threat of adversarial attacks more realistic. Measuring transferability is a crucial problem, but the Attack Success Rate alone does not provide a sound evaluation. This paper proposes a new methodology for evaluating transferability by putting distortion in a central position. This new tool shows that transferable attacks may perform far worse than a black box attack if the attacker randomly picks the source model. To address this issue, we propose a new selection mechanism, called FiT, which aims at choosing the best source model with only a few preliminary queries to the target. Our experimental results show that FiT is highly effective at selecting the best source model for multiple scenarios such as single-model attacks, ensemble-model attacks and multiple attacks (Code available at: https://github.com/t-maho/transferability_measure_fit)

    A Stealthy and Robust Fingerprinting Scheme for Generative Models

    Full text link
    This paper presents a novel fingerprinting methodology for the Intellectual Property protection of generative models. Prior solutions for discriminative models usually adopt adversarial examples as the fingerprints, which give anomalous inference behaviors and prediction results. Hence, these methods are not stealthy and can be easily recognized by the adversary. Our approach leverages the invisible backdoor technique to overcome the above limitation. Specifically, we design verification samples, whose model outputs look normal but can trigger a backdoor classifier to make abnormal predictions. We propose a new backdoor embedding approach with Unique-Triplet Loss and fine-grained categorization to enhance the effectiveness of our fingerprints. Extensive evaluations show that this solution can outperform other strategies with higher robustness, uniqueness and stealthiness for various GAN models

    GrOVe: Ownership Verification of Graph Neural Networks using Embeddings

    Full text link
    Graph neural networks (GNNs) have emerged as a state-of-the-art approach to model and draw inferences from large scale graph-structured data in various application settings such as social networking. The primary goal of a GNN is to learn an embedding for each graph node in a dataset that encodes both the node features and the local graph structure around the node. Embeddings generated by a GNN for a graph node are unique to that GNN. Prior work has shown that GNNs are prone to model extraction attacks. Model extraction attacks and defenses have been explored extensively in other non-graph settings. While detecting or preventing model extraction appears to be difficult, deterring them via effective ownership verification techniques offer a potential defense. In non-graph settings, fingerprinting models, or the data used to build them, have shown to be a promising approach toward ownership verification. We present GrOVe, a state-of-the-art GNN model fingerprinting scheme that, given a target model and a suspect model, can reliably determine if the suspect model was trained independently of the target model or if it is a surrogate of the target model obtained via model extraction. We show that GrOVe can distinguish between surrogate and independent models even when the independent model uses the same training dataset and architecture as the original target model. Using six benchmark datasets and three model architectures, we show that consistently achieves low false-positive and false-negative rates. We demonstrate that is robust against known fingerprint evasion techniques while remaining computationally efficient.Comment: 11 pages, 5 figure

    NaturalFinger: Generating Natural Fingerprint with Generative Adversarial Networks

    Full text link
    Deep neural network (DNN) models have become a critical asset of the model owner as training them requires a large amount of resource (i.e. labeled data). Therefore, many fingerprinting schemes have been proposed to safeguard the intellectual property (IP) of the model owner against model extraction and illegal redistribution. However, previous schemes adopt unnatural images as the fingerprint, such as adversarial examples and noisy images, which can be easily perceived and rejected by the adversary. In this paper, we propose NaturalFinger which generates natural fingerprint with generative adversarial networks (GANs). Besides, our proposed NaturalFinger fingerprints the decision difference areas rather than the decision boundary, which is more robust. The application of GAN not only allows us to generate more imperceptible samples, but also enables us to generate unrestricted samples to explore the decision boundary.To demonstrate the effectiveness of our fingerprint approach, we evaluate our approach against four model modification attacks including adversarial training and two model extraction attacks. Experiments show that our approach achieves 0.91 ARUC value on the FingerBench dataset (154 models), exceeding the optimal baseline (MetaV) over 17\%

    Security and Ownership Verification in Deep Reinforcement Learning

    Get PDF
    Deep reinforcement learning (DRL) has seen many successes in complex tasks such as robot manipulation, autonomous driving, and competitive games. However, there are few studies on the security threats against DRL systems. In this thesis, we focus on two security concerns in DRL. The first security concern is adversarial perturbation attacks against DRL agents. % Adversarial perturbation attacks mislead DRL agents into taking sub-optimal actions by applying a small imperceptible perturbation to the states of the environment. Adversarial perturbation attacks mislead DRL agents into taking sub-optimal actions. These attacks apply small imperceptible perturbations to the agent's observations of the environment. Prior work shows that DRL agents are vulnerable to adversarial perturbation attacks. However, prior attacks are difficult to deploy in real-time settings. We show that universal adversarial perturbations (UAPs) are effective in reducing a DRL agent's performance in their tasks and are fast enough to be mounted in real-time. We propose three variants of UAPs. We evaluate the effectiveness of UAPs against different DRL agents (DQN, A2C, and PPO) in three different Atari 2600 games (Pong, Freeway, and Breakout). We show that UAPs can degrade agent performance by 100\%, in some cases even for a perturbation bound as small as l∞=0.01l_{\infty} = 0.01. We also propose a technique for detecting adversarial perturbation attacks. An effective detection technique can be used in DRL tasks with potentially negative outcomes (such as the agents failing in a task or accumulating negative rewards) by suspending the task before the negative result manifests due to adversarial perturbation attacks. Our experiments found that this detection method works best for Pong with perfect precision and recall against all adversarial perturbation attacks but is less robust for Breakout and Freeway. The second security concern is theft and unauthorized distribution of DRL agents. As DRL agents gain success in complex tasks, there is a growing interest to monetize them. However, the possibility of theft could jeopardize the profitability of deploying these agents. Robust ownership verification techniques can deter malicious parties from stealing these agents, and in the event where theft cannot be prevented, ownership verification techniques can be used to track down and prosecute perpetrators. There are two prior works on ownership verification of DRL agents using watermarks. However, these two techniques require the verifier to deploy the suspected stolen agent in an environment where the verifier has complete control over the environment states. We propose a new fingerprint technique where the verifier compares the percentage of action agreement between the suspect agent and the owner's agent in environments where UAPs are applied. Our experimental results show that there is a significant difference in the percentage of action agreement (up to 50%50\% in some cases) when the suspect agent is a copy of the owner's agent versus when the suspect agent is an independently trained agent
    corecore