Search CORE

645 research outputs found

BlackMarks: Blackbox Multibit Watermarking for Deep Neural Networks

Author: Chen Huili
Koushanfar Farinaz
Rouhani Bita Darvish
Publication venue
Publication date: 31/03/2019
Field of study

Deep Neural Networks have created a paradigm shift in our ability to comprehend raw data in various important fields ranging from computer vision and natural language processing to intelligence warfare and healthcare. While DNNs are increasingly deployed either in a white-box setting where the model internal is publicly known, or a black-box setting where only the model outputs are known, a practical concern is protecting the models against Intellectual Property (IP) infringement. We propose BlackMarks, the first end-to-end multi-bit watermarking framework that is applicable in the black-box scenario. BlackMarks takes the pre-trained unmarked model and the owner's binary signature as inputs and outputs the corresponding marked model with a set of watermark keys. To do so, BlackMarks first designs a model-dependent encoding scheme that maps all possible classes in the task to bit '0' and bit '1' by clustering the output activations into two groups. Given the owner's watermark signature (a binary string), a set of key image and label pairs are designed using targeted adversarial attacks. The watermark (WM) is then embedded in the prediction behavior of the target DNN by fine-tuning the model with generated WM key set. To extract the WM, the remote model is queried by the WM key images and the owner's signature is decoded from the corresponding predictions according to the designed encoding scheme. We perform a comprehensive evaluation of BlackMarks's performance on MNIST, CIFAR10, ImageNet datasets and corroborate its effectiveness and robustness. BlackMarks preserves the functionality of the original DNN and incurs negligible WM embedding runtime overhead as low as 2.054%

arXiv.org e-Print Archive

Robust Watermarking of Neural Network with Exponential Weighting

Author: Namba Ryota
Sakuma Jun
Publication venue
Publication date: 18/01/2019
Field of study

Deep learning has been achieving top performance in many tasks. Since training of a deep learning model requires a great deal of cost, we need to treat neural network models as valuable intellectual properties. One concern in such a situation is that some malicious user might redistribute the model or provide a prediction service using the model without permission. One promising solution is digital watermarking, to embed a mechanism into the model so that the owner of the model can verify the ownership of the model externally. In this study, we present a novel attack method against watermark, query modification, and demonstrate that all of the existing watermark methods are vulnerable to either of query modification or existing attack method (model modification). To overcome this vulnerability, we present a novel watermarking method, exponential weighting. We experimentally show that our watermarking method achieves high verification performance of watermark even under a malicious attempt of unauthorized service providers, such as model modification and query modification, without sacrificing the predictive performance of the neural network model.Comment: 13 page

arXiv.org e-Print Archive

Robust Spatial-spread Deep Neural Image Watermarking

Author: Plata Marcin
Syga Piotr
Publication venue
Publication date: 04/11/2020
Field of study

Watermarking is an operation of embedding an information into an image in a way that allows to identify ownership of the image despite applying some distortions on it. In this paper, we presented a novel end-to-end solution for embedding and recovering the watermark in the digital image using convolutional neural networks. The method is based on spreading the message over the spatial domain of the image, hence reducing the "local bits per pixel" capacity. To obtain the model we used adversarial training and applied noiser layers between the encoder and the decoder. Moreover, we broadened the spectrum of typically considered attacks on the watermark and by grouping the attacks according to their scope, we achieved high general robustness, most notably against JPEG compression, Gaussian blurring, subsampling or resizing. To help us in the models training we also proposed a precise differentiable approximation of JPEG.Comment: The article was accepted on TrustCom 2020: The 19th IEEE International Conference on Trust, Security and Privacy in Computing and Communication

arXiv.org e-Print Archive

Performance Comparison of Contemporary DNN Watermarking Techniques

Author: Chen Huili
Fan Xinwei
Kilinc Osman Cihan
Koushanfar Farinaz
Rouhani Bita Darvish
Publication venue
Publication date: 08/11/2018
Field of study

DNNs shall be considered as the intellectual property (IP) of the model builder due to the impeding cost of designing/training a highly accurate model. Research attempts have been made to protect the authorship of the trained model and prevent IP infringement using DNN watermarking techniques. In this paper, we provide a comprehensive performance comparison of the state-of-the-art DNN watermarking methodologies according to the essential requisites for an effective watermarking technique. We identify the pros and cons of each scheme and provide insights into the underlying rationale. Empirical results corroborate that DeepSigns framework proposed in [4] has the best overall performance in terms of the evaluation metrics. Our comparison facilitates the development of pending watermarking approaches and enables the model owner to deploy the watermarking scheme that satisfying her requirements

arXiv.org e-Print Archive

Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks

Author: Dumitras Tudor
Goldstein Tom
Huang W. Ronny
Najibi Mahyar
Shafahi Ali
Studer Christoph
Suciu Octavian
Publication venue
Publication date: 10/11/2018
Field of study

Data poisoning is an attack on machine learning models wherein the attacker adds examples to the training set to manipulate the behavior of the model at test time. This paper explores poisoning attacks on neural nets. The proposed attacks use "clean-labels"; they don't require the attacker to have any control over the labeling of training data. They are also targeted; they control the behavior of the classifier on a

\textit{specific}

test instance without degrading overall classifier performance. For example, an attacker could add a seemingly innocuous image (that is properly labeled) to a training set for a face recognition engine, and control the identity of a chosen person at test time. Because the attacker does not need to control the labeling function, poisons could be entered into the training set simply by leaving them on the web and waiting for them to be scraped by a data collection bot. We present an optimization-based method for crafting poisons, and show that just one single poison image can control classifier behavior when transfer learning is used. For full end-to-end training, we present a "watermarking" strategy that makes poisoning reliable using multiple (

\approx

50) poisoned training instances. We demonstrate our method by generating poisoned frog images from the CIFAR dataset and using them to manipulate image classifiers.Comment: Presented at the NIPS 2018 conference. 11 pages, 4 figures, with a supplementary section of 7 pages, 7 figures. First two authors contributed equall

arXiv.org e-Print Archive

Local Gradients Smoothing: Defense against localized adversarial attacks

Author: Khan Salman H.
Naseer Muzammal
Porikli Fatih
Publication venue
Publication date: 19/11/2018
Field of study

Deep neural networks (DNNs) have shown vulnerability to adversarial attacks, i.e., carefully perturbed inputs designed to mislead the network at inference time. Recently introduced localized attacks, Localized and Visible Adversarial Noise (LaVAN) and Adversarial patch, pose a new challenge to deep learning security by adding adversarial noise only within a specific region without affecting the salient objects in an image. Driven by the observation that such attacks introduce concentrated high-frequency changes at a particular image location, we have developed an effective method to estimate noise location in gradient domain and transform those high activation regions caused by adversarial noise in image domain while having minimal effect on the salient object that is important for correct classification. Our proposed Local Gradients Smoothing (LGS) scheme achieves this by regularizing gradients in the estimated noisy region before feeding the image to DNN for inference. We have shown the effectiveness of our method in comparison to other defense methods including Digital Watermarking, JPEG compression, Total Variance Minimization (TVM) and Feature squeezing on ImageNet dataset. In addition, we systematically study the robustness of the proposed defense mechanism against Back Pass Differentiable Approximation (BPDA), a state of the art attack recently developed to break defenses that transform an input sample to minimize the adversarial effect. Compared to other defense mechanisms, LGS is by far the most resistant to BPDA in localized adversarial attack setting.Comment: Accepted At WACV-201

arXiv.org e-Print Archive

Fraternal Twins: Unifying Attacks on Machine Learning and Digital Watermarking

Author: Arp Daniel
Quiring Erwin
Rieck Konrad
Publication venue
Publication date: 16/03/2017
Field of study

Machine learning is increasingly used in security-critical applications, such as autonomous driving, face recognition and malware detection. Most learning methods, however, have not been designed with security in mind and thus are vulnerable to different types of attacks. This problem has motivated the research field of adversarial machine learning that is concerned with attacking and defending learning methods. Concurrently, a different line of research has tackled a very similar problem: In digital watermarking information are embedded in a signal in the presence of an adversary. As a consequence, this research field has also extensively studied techniques for attacking and defending watermarking methods. The two research communities have worked in parallel so far, unnoticeably developing similar attack and defense strategies. This paper is a first effort to bring these communities together. To this end, we present a unified notation of black-box attacks against machine learning and watermarking that reveals the similarity of both settings. To demonstrate the efficacy of this unified view, we apply concepts from watermarking to machine learning and vice versa. We show that countermeasures from watermarking can mitigate recent model-extraction attacks and, similarly, that techniques for hardening machine learning can fend off oracle attacks against watermarks. Our work provides a conceptual link between two research fields and thereby opens novel directions for improving the security of both, machine learning and digital watermarking

arXiv.org e-Print Archive

Turning Your Weakness Into a Strength: Watermarking Deep Neural Networks by Backdooring

Author: Adi Yossi
Baum Carsten
Cisse Moustapha
Keshet Joseph
Pinkas Benny
Publication venue
Publication date: 11/06/2018
Field of study

Deep Neural Networks have recently gained lots of success after enabling several breakthroughs in notoriously challenging problems. Training these networks is computationally expensive and requires vast amounts of training data. Selling such pre-trained models can, therefore, be a lucrative business model. Unfortunately, once the models are sold they can be easily copied and redistributed. To avoid this, a tracking mechanism to identify models as the intellectual property of a particular vendor is necessary. In this work, we present an approach for watermarking Deep Neural Networks in a black-box way. Our scheme works for general classification tasks and can easily be combined with current learning algorithms. We show experimentally that such a watermark has no noticeable impact on the primary task that the model is designed for and evaluate the robustness of our proposal against a multitude of practical attacks. Moreover, we provide a theoretical analysis, relating our approach to previous work on backdooring

arXiv.org e-Print Archive

zoNNscan : a boundary-entropy index for zone inspection of neural models

Author: Jaouen Adel
Merrer Erwan Le
Publication venue
Publication date: 21/08/2018
Field of study

The training of deep neural network classifiers results in decision boundaries which geometry is still not well understood. This is in direct relation with classification problems such as so called adversarial examples. We introduce zoNNscan, an index that is intended to inform on the boundary uncertainty (in terms of the presence of other classes) around one given input datapoint. It is based on confidence entropy, and is implemented through sampling in the multidimensional ball surrounding that input. We detail the zoNNscan index, give an algorithm for approximating it, and finally illustrate its benefits on four applications, including two important problems for the adoption of deep networks in critical systems: adversarial examples and corner case inputs. We highlight that zoNNscan exhibits significantly higher values than for standard inputs in those two problem classes

arXiv.org e-Print Archive

Digital Passport: A Novel Technological Strategy for Intellectual Property Protection of Convolutional Neural Networks

Author: Chan Chee Seng
Fan Lixin
Ng KamWoh
Publication venue
Publication date: 10/05/2019
Field of study

In order to prevent deep neural networks from being infringed by unauthorized parties, we propose a generic solution which embeds a designated digital passport into a network, and subsequently, either paralyzes the network functionalities for unauthorized usages or maintain its functionalities in the presence of a verified passport. Such a desired network behavior is successfully demonstrated in a number of implementation schemes, which provide reliable, preventive and timely protections against tens of thousands of fake-passport deceptions. Extensive experiments also show that the deep neural network performance under unauthorized usages deteriorate significantly (e.g. with 33% to 82% reductions of CIFAR10 classification accuracies), while networks endorsed with valid passports remain intact.Comment: This paper proposes a new timely IPR solution that embed digital passports into CNN models to prevent the unauthorized network usage (i.e. infringement) by paralyzing the networks while maintaining its functionality for verified user

arXiv.org e-Print Archive