289 research outputs found
Advanced Linear Identification Techniques For Signal Processing And Digital Video Broadcasting
Linear identification technique is to linearly embed a piece of unique information into digital media data for the purpose of satisfying specific demands such as identification, annotation, and copyright, etc. We need to consider the quantity and the quality of identification data to be embedded as well as the corresponding interference to the original subject signal. However, there exist no generalized computationally-efficient optimization techniques for linear identification up to now. Therefore, in this dissertation work, we try to theoretically investigate the advanced linear identification techniques and combat the tradeoff problems between the quality of the embedded identification data and the quality of the subject signal. Two particular signal processing and telecommunication applications, namely transmitter identification and digital watermarking, will be exploited in this work. We propose a novel optimization paradigm for both digital terrestrial television (DTV) systems and multiple digital watermarking systems to maximize the overall signal-to-interference-plus-noise ratio (SINR) over both identification and subject signals. The new theories and practice related to pseudo random sequences, extended arithmetic-geometric mean inequality, and constrained overall system performance are also presented in this dissertation
Safe and Robust Watermark Injection with a Single OoD Image
Training a high-performance deep neural network requires large amounts of
data and computational resources. Protecting the intellectual property (IP) and
commercial ownership of a deep model is challenging yet increasingly crucial. A
major stream of watermarking strategies implants verifiable backdoor triggers
by poisoning training samples, but these are often unrealistic due to data
privacy and safety concerns and are vulnerable to minor model changes such as
fine-tuning. To overcome these challenges, we propose a safe and robust
backdoor-based watermark injection technique that leverages the diverse
knowledge from a single out-of-distribution (OoD) image, which serves as a
secret key for IP verification. The independence of training data makes it
agnostic to third-party promises of IP security. We induce robustness via
random perturbation of model parameters during watermark injection to defend
against common watermark removal attacks, including fine-tuning, pruning, and
model extraction. Our experimental results demonstrate that the proposed
watermarking approach is not only time- and sample-efficient without training
data, but also robust against the watermark removal attacks above
A Somewhat Robust Image Watermark against Diffusion-based Editing Models
Recently, diffusion models (DMs) have become the state-of-the-art method for
image synthesis. Editing models based on DMs, known for their high fidelity and
precision, have inadvertently introduced new challenges related to image
copyright infringement and malicious editing. Our work is the first to
formalize and address this issue. After assessing and attempting to enhance
traditional image watermarking techniques, we recognize their limitations in
this emerging context. In response, we develop a novel technique, RIW (Robust
Invisible Watermarking), to embed invisible watermarks leveraging adversarial
example techniques. Our technique ensures a high extraction accuracy of
for the invisible watermark after editing, compared to the offered by
conventional methods. We provide access to our code at
https://github.com/BennyTMT/RIW
Adversarial Deep Learning and Security with a Hardware Perspective
Adversarial deep learning is the field of study which analyzes deep learning in the presence of adversarial entities. This entails understanding the capabilities, objectives, and attack scenarios available to the adversary to develop defensive mechanisms and avenues of robustness available to the benign parties. Understanding this facet of deep learning helps us improve the safety of the deep learning systems against external threats from adversaries. However, of equal importance, this perspective also helps the industry understand and respond to critical failures in the technology. The expectation of future success has driven significant interest in developing this technology broadly. Adversarial deep learning stands as a balancing force to ensure these developments remain grounded in the real-world and proceed along a responsible trajectory. Recently, the growth of deep learning has begun intersecting with the computer hardware domain to improve performance and efficiency for resource constrained application domains. The works investigated in this dissertation constitute our pioneering efforts in migrating adversarial deep learning into the hardware domain alongside its parent field of research
PromptCARE: Prompt Copyright Protection by Watermark Injection and Verification
Large language models (LLMs) have witnessed a meteoric rise in popularity
among the general public users over the past few months, facilitating diverse
downstream tasks with human-level accuracy and proficiency. Prompts play an
essential role in this success, which efficiently adapt pre-trained LLMs to
task-specific applications by simply prepending a sequence of tokens to the
query texts. However, designing and selecting an optimal prompt can be both
expensive and demanding, leading to the emergence of Prompt-as-a-Service
providers who profit by providing well-designed prompts for authorized use.
With the growing popularity of prompts and their indispensable role in
LLM-based services, there is an urgent need to protect the copyright of prompts
against unauthorized use.
In this paper, we propose PromptCARE, the first framework for prompt
copyright protection through watermark injection and verification. Prompt
watermarking presents unique challenges that render existing watermarking
techniques developed for model and dataset copyright verification ineffective.
PromptCARE overcomes these hurdles by proposing watermark injection and
verification schemes tailor-made for prompts and NLP characteristics. Extensive
experiments on six well-known benchmark datasets, using three prevalent
pre-trained LLMs (BERT, RoBERTa, and Facebook OPT-1.3b), demonstrate the
effectiveness, harmlessness, robustness, and stealthiness of PromptCARE.Comment: To Appear in the 45th IEEE Symposium on Security and Privacy 2024,
code is available at: https://github.com/grasses/PromptCAR
A Survey on ChatGPT: AI-Generated Contents, Challenges, and Solutions
With the widespread use of large artificial intelligence (AI) models such as
ChatGPT, AI-generated content (AIGC) has garnered increasing attention and is
leading a paradigm shift in content creation and knowledge representation. AIGC
uses generative large AI algorithms to assist or replace humans in creating
massive, high-quality, and human-like content at a faster pace and lower cost,
based on user-provided prompts. Despite the recent significant progress in
AIGC, security, privacy, ethical, and legal challenges still need to be
addressed. This paper presents an in-depth survey of working principles,
security and privacy threats, state-of-the-art solutions, and future challenges
of the AIGC paradigm. Specifically, we first explore the enabling technologies,
general architecture of AIGC, and discuss its working modes and key
characteristics. Then, we investigate the taxonomy of security and privacy
threats to AIGC and highlight the ethical and societal implications of GPT and
AIGC technologies. Furthermore, we review the state-of-the-art AIGC
watermarking approaches for regulatable AIGC paradigms regarding the AIGC model
and its produced content. Finally, we identify future challenges and open
research directions related to AIGC.Comment: 20 pages, 6 figures, 4 table
From Zero to Hero: Detecting Leaked Data through Synthetic Data Injection and Model Querying
Safeguarding the Intellectual Property (IP) of data has become critically
important as machine learning applications continue to proliferate, and their
success heavily relies on the quality of training data. While various
mechanisms exist to secure data during storage, transmission, and consumption,
fewer studies have been developed to detect whether they are already leaked for
model training without authorization. This issue is particularly challenging
due to the absence of information and control over the training process
conducted by potential attackers.
In this paper, we concentrate on the domain of tabular data and introduce a
novel methodology, Local Distribution Shifting Synthesis (\textsc{LDSS}), to
detect leaked data that are used to train classification models. The core
concept behind \textsc{LDSS} involves injecting a small volume of synthetic
data--characterized by local shifts in class distribution--into the owner's
dataset. This enables the effective identification of models trained on leaked
data through model querying alone, as the synthetic data injection results in a
pronounced disparity in the predictions of models trained on leaked and
modified datasets. \textsc{LDSS} is \emph{model-oblivious} and hence compatible
with a diverse range of classification models, such as Naive Bayes, Decision
Tree, and Random Forest. We have conducted extensive experiments on seven types
of classification models across five real-world datasets. The comprehensive
results affirm the reliability, robustness, fidelity, security, and efficiency
of \textsc{LDSS}.Comment: 13 pages, 11 figures, and 4 table
- …