645 research outputs found
BlackMarks: Blackbox Multibit Watermarking for Deep Neural Networks
Deep Neural Networks have created a paradigm shift in our ability to
comprehend raw data in various important fields ranging from computer vision
and natural language processing to intelligence warfare and healthcare. While
DNNs are increasingly deployed either in a white-box setting where the model
internal is publicly known, or a black-box setting where only the model outputs
are known, a practical concern is protecting the models against Intellectual
Property (IP) infringement. We propose BlackMarks, the first end-to-end
multi-bit watermarking framework that is applicable in the black-box scenario.
BlackMarks takes the pre-trained unmarked model and the owner's binary
signature as inputs and outputs the corresponding marked model with a set of
watermark keys. To do so, BlackMarks first designs a model-dependent encoding
scheme that maps all possible classes in the task to bit '0' and bit '1' by
clustering the output activations into two groups. Given the owner's watermark
signature (a binary string), a set of key image and label pairs are designed
using targeted adversarial attacks. The watermark (WM) is then embedded in the
prediction behavior of the target DNN by fine-tuning the model with generated
WM key set. To extract the WM, the remote model is queried by the WM key images
and the owner's signature is decoded from the corresponding predictions
according to the designed encoding scheme. We perform a comprehensive
evaluation of BlackMarks's performance on MNIST, CIFAR10, ImageNet datasets and
corroborate its effectiveness and robustness. BlackMarks preserves the
functionality of the original DNN and incurs negligible WM embedding runtime
overhead as low as 2.054%
Robust Watermarking of Neural Network with Exponential Weighting
Deep learning has been achieving top performance in many tasks. Since
training of a deep learning model requires a great deal of cost, we need to
treat neural network models as valuable intellectual properties. One concern in
such a situation is that some malicious user might redistribute the model or
provide a prediction service using the model without permission. One promising
solution is digital watermarking, to embed a mechanism into the model so that
the owner of the model can verify the ownership of the model externally. In
this study, we present a novel attack method against watermark, query
modification, and demonstrate that all of the existing watermark methods are
vulnerable to either of query modification or existing attack method (model
modification). To overcome this vulnerability, we present a novel watermarking
method, exponential weighting. We experimentally show that our watermarking
method achieves high verification performance of watermark even under a
malicious attempt of unauthorized service providers, such as model modification
and query modification, without sacrificing the predictive performance of the
neural network model.Comment: 13 page
Robust Spatial-spread Deep Neural Image Watermarking
Watermarking is an operation of embedding an information into an image in a
way that allows to identify ownership of the image despite applying some
distortions on it. In this paper, we presented a novel end-to-end solution for
embedding and recovering the watermark in the digital image using convolutional
neural networks. The method is based on spreading the message over the spatial
domain of the image, hence reducing the "local bits per pixel" capacity. To
obtain the model we used adversarial training and applied noiser layers between
the encoder and the decoder. Moreover, we broadened the spectrum of typically
considered attacks on the watermark and by grouping the attacks according to
their scope, we achieved high general robustness, most notably against JPEG
compression, Gaussian blurring, subsampling or resizing. To help us in the
models training we also proposed a precise differentiable approximation of
JPEG.Comment: The article was accepted on TrustCom 2020: The 19th IEEE
International Conference on Trust, Security and Privacy in Computing and
Communication
Performance Comparison of Contemporary DNN Watermarking Techniques
DNNs shall be considered as the intellectual property (IP) of the model
builder due to the impeding cost of designing/training a highly accurate model.
Research attempts have been made to protect the authorship of the trained model
and prevent IP infringement using DNN watermarking techniques. In this paper,
we provide a comprehensive performance comparison of the state-of-the-art DNN
watermarking methodologies according to the essential requisites for an
effective watermarking technique. We identify the pros and cons of each scheme
and provide insights into the underlying rationale. Empirical results
corroborate that DeepSigns framework proposed in [4] has the best overall
performance in terms of the evaluation metrics. Our comparison facilitates the
development of pending watermarking approaches and enables the model owner to
deploy the watermarking scheme that satisfying her requirements
Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks
Data poisoning is an attack on machine learning models wherein the attacker
adds examples to the training set to manipulate the behavior of the model at
test time. This paper explores poisoning attacks on neural nets. The proposed
attacks use "clean-labels"; they don't require the attacker to have any control
over the labeling of training data. They are also targeted; they control the
behavior of the classifier on a test instance without
degrading overall classifier performance. For example, an attacker could add a
seemingly innocuous image (that is properly labeled) to a training set for a
face recognition engine, and control the identity of a chosen person at test
time. Because the attacker does not need to control the labeling function,
poisons could be entered into the training set simply by leaving them on the
web and waiting for them to be scraped by a data collection bot.
We present an optimization-based method for crafting poisons, and show that
just one single poison image can control classifier behavior when transfer
learning is used. For full end-to-end training, we present a "watermarking"
strategy that makes poisoning reliable using multiple (50) poisoned
training instances. We demonstrate our method by generating poisoned frog
images from the CIFAR dataset and using them to manipulate image classifiers.Comment: Presented at the NIPS 2018 conference. 11 pages, 4 figures, with a
supplementary section of 7 pages, 7 figures. First two authors contributed
equall
Local Gradients Smoothing: Defense against localized adversarial attacks
Deep neural networks (DNNs) have shown vulnerability to adversarial attacks,
i.e., carefully perturbed inputs designed to mislead the network at inference
time. Recently introduced localized attacks, Localized and Visible Adversarial
Noise (LaVAN) and Adversarial patch, pose a new challenge to deep learning
security by adding adversarial noise only within a specific region without
affecting the salient objects in an image. Driven by the observation that such
attacks introduce concentrated high-frequency changes at a particular image
location, we have developed an effective method to estimate noise location in
gradient domain and transform those high activation regions caused by
adversarial noise in image domain while having minimal effect on the salient
object that is important for correct classification. Our proposed Local
Gradients Smoothing (LGS) scheme achieves this by regularizing gradients in the
estimated noisy region before feeding the image to DNN for inference. We have
shown the effectiveness of our method in comparison to other defense methods
including Digital Watermarking, JPEG compression, Total Variance Minimization
(TVM) and Feature squeezing on ImageNet dataset. In addition, we systematically
study the robustness of the proposed defense mechanism against Back Pass
Differentiable Approximation (BPDA), a state of the art attack recently
developed to break defenses that transform an input sample to minimize the
adversarial effect. Compared to other defense mechanisms, LGS is by far the
most resistant to BPDA in localized adversarial attack setting.Comment: Accepted At WACV-201
Fraternal Twins: Unifying Attacks on Machine Learning and Digital Watermarking
Machine learning is increasingly used in security-critical applications, such
as autonomous driving, face recognition and malware detection. Most learning
methods, however, have not been designed with security in mind and thus are
vulnerable to different types of attacks. This problem has motivated the
research field of adversarial machine learning that is concerned with attacking
and defending learning methods. Concurrently, a different line of research has
tackled a very similar problem: In digital watermarking information are
embedded in a signal in the presence of an adversary. As a consequence, this
research field has also extensively studied techniques for attacking and
defending watermarking methods.
The two research communities have worked in parallel so far, unnoticeably
developing similar attack and defense strategies. This paper is a first effort
to bring these communities together. To this end, we present a unified notation
of black-box attacks against machine learning and watermarking that reveals the
similarity of both settings. To demonstrate the efficacy of this unified view,
we apply concepts from watermarking to machine learning and vice versa. We show
that countermeasures from watermarking can mitigate recent model-extraction
attacks and, similarly, that techniques for hardening machine learning can fend
off oracle attacks against watermarks. Our work provides a conceptual link
between two research fields and thereby opens novel directions for improving
the security of both, machine learning and digital watermarking
Turning Your Weakness Into a Strength: Watermarking Deep Neural Networks by Backdooring
Deep Neural Networks have recently gained lots of success after enabling
several breakthroughs in notoriously challenging problems. Training these
networks is computationally expensive and requires vast amounts of training
data. Selling such pre-trained models can, therefore, be a lucrative business
model. Unfortunately, once the models are sold they can be easily copied and
redistributed. To avoid this, a tracking mechanism to identify models as the
intellectual property of a particular vendor is necessary.
In this work, we present an approach for watermarking Deep Neural Networks in
a black-box way. Our scheme works for general classification tasks and can
easily be combined with current learning algorithms. We show experimentally
that such a watermark has no noticeable impact on the primary task that the
model is designed for and evaluate the robustness of our proposal against a
multitude of practical attacks. Moreover, we provide a theoretical analysis,
relating our approach to previous work on backdooring
zoNNscan : a boundary-entropy index for zone inspection of neural models
The training of deep neural network classifiers results in decision
boundaries which geometry is still not well understood. This is in direct
relation with classification problems such as so called adversarial examples.
We introduce zoNNscan, an index that is intended to inform on the boundary
uncertainty (in terms of the presence of other classes) around one given input
datapoint. It is based on confidence entropy, and is implemented through
sampling in the multidimensional ball surrounding that input. We detail the
zoNNscan index, give an algorithm for approximating it, and finally illustrate
its benefits on four applications, including two important problems for the
adoption of deep networks in critical systems: adversarial examples and corner
case inputs. We highlight that zoNNscan exhibits significantly higher values
than for standard inputs in those two problem classes
Digital Passport: A Novel Technological Strategy for Intellectual Property Protection of Convolutional Neural Networks
In order to prevent deep neural networks from being infringed by unauthorized
parties, we propose a generic solution which embeds a designated digital
passport into a network, and subsequently, either paralyzes the network
functionalities for unauthorized usages or maintain its functionalities in the
presence of a verified passport. Such a desired network behavior is
successfully demonstrated in a number of implementation schemes, which provide
reliable, preventive and timely protections against tens of thousands of
fake-passport deceptions. Extensive experiments also show that the deep neural
network performance under unauthorized usages deteriorate significantly (e.g.
with 33% to 82% reductions of CIFAR10 classification accuracies), while
networks endorsed with valid passports remain intact.Comment: This paper proposes a new timely IPR solution that embed digital
passports into CNN models to prevent the unauthorized network usage (i.e.
infringement) by paralyzing the networks while maintaining its functionality
for verified user
- …