54 research outputs found
A Stealthy and Robust Fingerprinting Scheme for Generative Models
This paper presents a novel fingerprinting methodology for the Intellectual
Property protection of generative models. Prior solutions for discriminative
models usually adopt adversarial examples as the fingerprints, which give
anomalous inference behaviors and prediction results. Hence, these methods are
not stealthy and can be easily recognized by the adversary. Our approach
leverages the invisible backdoor technique to overcome the above limitation.
Specifically, we design verification samples, whose model outputs look normal
but can trigger a backdoor classifier to make abnormal predictions. We propose
a new backdoor embedding approach with Unique-Triplet Loss and fine-grained
categorization to enhance the effectiveness of our fingerprints. Extensive
evaluations show that this solution can outperform other strategies with higher
robustness, uniqueness and stealthiness for various GAN models
When NAS Meets Watermarking: Ownership Verification of DNN Models via Cache Side Channels
We present a novel watermarking scheme to verify the ownership of DNN models.
Existing solutions embedded watermarks into the model parameters, which were
proven to be removable and detectable by an adversary to invalidate the
protection. In contrast, we propose to implant watermarks into the model
architectures. We design new algorithms based on Neural Architecture Search
(NAS) to generate watermarked architectures, which are unique enough to
represent the ownership, while maintaining high model usability. We further
leverage cache side channels to extract and verify watermarks from the
black-box models at inference. Theoretical analysis and extensive evaluations
show our scheme has negligible impact on the model performance, and exhibits
strong robustness against various model transformations
DeepSweep: An Evaluation Framework for Mitigating DNN Backdoor Attacks using Data Augmentation
Public resources and services (e.g., datasets, training platforms,
pre-trained models) have been widely adopted to ease the development of Deep
Learning-based applications. However, if the third-party providers are
untrusted, they can inject poisoned samples into the datasets or embed
backdoors in those models. Such an integrity breach can cause severe
consequences, especially in safety- and security-critical applications. Various
backdoor attack techniques have been proposed for higher effectiveness and
stealthiness. Unfortunately, existing defense solutions are not practical to
thwart those attacks in a comprehensive way.
In this paper, we investigate the effectiveness of data augmentation
techniques in mitigating backdoor attacks and enhancing DL models' robustness.
An evaluation framework is introduced to achieve this goal. Specifically, we
consider a unified defense solution, which (1) adopts a data augmentation
policy to fine-tune the infected model and eliminate the effects of the
embedded backdoor; (2) uses another augmentation policy to preprocess input
samples and invalidate the triggers during inference. We propose a systematic
approach to discover the optimal policies for defending against different
backdoor attacks by comprehensively evaluating 71 state-of-the-art data
augmentation functions. Extensive experiments show that our identified policy
can effectively mitigate eight different kinds of backdoor attacks and
outperform five existing defense methods. We envision this framework can be a
good benchmark tool to advance future DNN backdoor studies
Text Classification via Large Language Models
Despite the remarkable success of large-scale Language Models (LLMs) such as
GPT-3, their performances still significantly underperform fine-tuned models in
the task of text classification. This is due to (1) the lack of reasoning
ability in addressing complex linguistic phenomena (e.g., intensification,
contrast, irony etc); (2) limited number of tokens allowed in in-context
learning.
In this paper, we introduce Clue And Reasoning Prompting (CARP). CARP adopts
a progressive reasoning strategy tailored to addressing the complex linguistic
phenomena involved in text classification: CARP first prompts LLMs to find
superficial clues (e.g., keywords, tones, semantic relations, references, etc),
based on which a diagnostic reasoning process is induced for final decisions.
To further address the limited-token issue, CARP uses a fine-tuned model on the
supervised dataset for NN demonstration search in the in-context learning,
allowing the model to take the advantage of both LLM's generalization ability
and the task-specific evidence provided by the full labeled dataset.
Remarkably, CARP yields new SOTA performances on 4 out of 5 widely-used
text-classification benchmarks, 97.39 (+1.24) on SST-2, 96.40 (+0.72) on
AGNews, 98.78 (+0.25) on R8 and 96.95 (+0.6) on R52, and a performance
comparable to SOTA on MR (92.39 v.s. 93.3). More importantly, we find that CARP
delivers impressive abilities on low-resource and domain-adaptation setups.
Specifically, using 16 examples per class, CARP achieves comparable
performances to supervised models with 1,024 examples per class.Comment: Pre-print Versio
Mercury: An Automated Remote Side-channel Attack to Nvidia Deep Learning Accelerator
DNN accelerators have been widely deployed in many scenarios to speed up the
inference process and reduce the energy consumption. One big concern about the
usage of the accelerators is the confidentiality of the deployed models: model
inference execution on the accelerators could leak side-channel information,
which enables an adversary to preciously recover the model details. Such model
extraction attacks can not only compromise the intellectual property of DNN
models, but also facilitate some adversarial attacks.
Although previous works have demonstrated a number of side-channel techniques
to extract models from DNN accelerators, they are not practical for two
reasons. (1) They only target simplified accelerator implementations, which
have limited practicality in the real world. (2) They require heavy human
analysis and domain knowledge. To overcome these limitations, this paper
presents Mercury, the first automated remote side-channel attack against the
off-the-shelf Nvidia DNN accelerator. The key insight of Mercury is to model
the side-channel extraction process as a sequence-to-sequence problem. The
adversary can leverage a time-to-digital converter (TDC) to remotely collect
the power trace of the target model's inference. Then he uses a learning model
to automatically recover the architecture details of the victim model from the
power trace without any prior knowledge. The adversary can further use the
attention mechanism to localize the leakage points that contribute most to the
attack. Evaluation results indicate that Mercury can keep the error rate of
model extraction below 1%
- …