408 research outputs found
Stable Unlearnable Example: Enhancing the Robustness of Unlearnable Examples via Stable Error-Minimizing Noise
The open source of large amounts of image data promotes the development of
deep learning techniques. Along with this comes the privacy risk of these
open-source image datasets being exploited by unauthorized third parties to
train deep learning models for commercial or illegal purposes. To avoid the
abuse of public data, a poisoning-based technique, the unlearnable example, is
proposed to significantly degrade the generalization performance of models by
adding a kind of imperceptible noise to the data. To further enhance its
robustness against adversarial training, existing works leverage iterative
adversarial training on both the defensive noise and the surrogate model.
However, it still remains unknown whether the robustness of unlearnable
examples primarily comes from the effect of enhancement in the surrogate model
or the defensive noise. Observing that simply removing the adversarial noise on
the training process of the defensive noise can improve the performance of
robust unlearnable examples, we identify that solely the surrogate model's
robustness contributes to the performance. Furthermore, we found a negative
correlation exists between the robustness of defensive noise and the protection
performance, indicating defensive noise's instability issue. Motivated by this,
to further boost the robust unlearnable example, we introduce stable
error-minimizing noise (SEM), which trains the defensive noise against random
perturbation instead of the time-consuming adversarial perturbation to improve
the stability of defensive noise. Through extensive experiments, we demonstrate
that SEM achieves a new state-of-the-art performance on CIFAR-10, CIFAR-100,
and ImageNet Subset in terms of both effectiveness and efficiency. The code is
available at https://github.com/liuyixin-louis/Stable-Unlearnable-Example.Comment: Accepted to AAAI 202
BadGPT: Exploring Security Vulnerabilities of ChatGPT via Backdoor Attacks to InstructGPT
Recently, ChatGPT has gained significant attention in research due to its
ability to interact with humans effectively. The core idea behind this model is
reinforcement learning (RL) fine-tuning, a new paradigm that allows language
models to align with human preferences, i.e., InstructGPT. In this study, we
propose BadGPT, the first backdoor attack against RL fine-tuning in language
models. By injecting a backdoor into the reward model, the language model can
be compromised during the fine-tuning stage. Our initial experiments on movie
reviews, i.e., IMDB, demonstrate that an attacker can manipulate the generated
text through BadGPT.Comment: This paper is accepted as a poster in NDSS202
Parameter sensitivity and economic analyses of an interchange-fracture enhanced geothermal system
Previous research has shown that interchange-fracture enhanced geothermal systems show desirable heat extraction performance. However, their parameter sensitivity has not been systematically investigated. In this study, a three-dimensional, unsteady flow and heat transfer model for an enhanced geothermal system with an interchange-fracture structure was established. The influences of pivotal parameters, including stimulated reservoir volume permeability, fracture spacing, fracture aperture, and injection flow rate on the thermal extraction performance of the interchange-fracture enhanced geothermal system were systematically researched. In addition, the economics of this system were evaluated. The results show that the heat extraction performance of the interchange-fracture system is significantly affected by a change of stimulated reservoir volume permeability and injection flow rate. Increasing permeability reduces electricity costs and improves economic income, while increasing the injection flow rate increases output power but hinders the long-term running stability of the system. Our research provides guidance for the optimal design of an interchange-fracture enhanced geothermal system.Cited as: Yu, G., Liu, C., Zhang, L., Fang, L. Parameter sensitivity and economic analyses of an interchange-fracture enhanced geothermal system. Advances in Geo-Energy Research, 2021, 5(2): 166-180, doi: 10.46690/ager.2021.02.0
Identification of a Rickettsial Endosymbiont in a Soft Tick \u3ci\u3eOrnithodoros turicata americanus\u3c/i\u3e
Bacterial endosymbionts are abundantly found in both hard and soft ticks. Occidentia massiliensis, a rickettsial endosymbiont, was first identified in the soft tick Ornithodoros sonrai collected from Senegal and later was identified in a hard tick Africaniella transversale. In this study, we noted the presence of Occidentia species, designated as Occidentia-like species, in a soft tick O. turicata americanus. Sequencing and phylogenetic analyses of the two genetic markers, 16S rRNA and groEL confirmed the presence of Occidentia-like species in O. turicata americanus ticks. The Occidentia-like species was noted to be present in all developmental stages of O. turicata americanus and in different tick tissues including ovaries, synganglion, guts and salivary gland. The levels of Occidentia-like species 16S rRNA transcripts were noted to be significantly higher in ovaries than in a gut tissue. In addition, Occidentia-like species groEL expression was noted to be significantly higher in tick synganglion than in ovaries and gut tissues. Furthermore, levels of Occidentia-like species 16S rRNA transcripts increased significantly upon O. turicata americanus blood feeding. Taken together, our study not only shows that Occidentia-like species is present in O. turicata americanus but also suggests that this bacterium may play a role in tick-bacteria interactions
SEAT: Stable and Explainable Attention
Currently, attention mechanism becomes a standard fixture in most
state-of-the-art natural language processing (NLP) models, not only due to
outstanding performance it could gain, but also due to plausible innate
explanation for the behaviors of neural architectures it provides, which is
notoriously difficult to analyze. However, recent studies show that attention
is unstable against randomness and perturbations during training or testing,
such as random seeds and slight perturbation of embedding vectors, which
impedes it from becoming a faithful explanation tool. Thus, a natural question
is whether we can find some substitute of the current attention which is more
stable and could keep the most important characteristics on explanation and
prediction of attention. In this paper, to resolve the problem, we provide a
first rigorous definition of such alternate namely SEAT (Stable and Explainable
Attention). Specifically, a SEAT should has the following three properties: (1)
Its prediction distribution is enforced to be close to the distribution based
on the vanilla attention; (2) Its top-k indices have large overlaps with those
of the vanilla attention; (3) It is robust w.r.t perturbations, i.e., any
slight perturbation on SEAT will not change the prediction distribution too
much, which implicitly indicates that it is stable to randomness and
perturbations. Finally, through intensive experiments on various datasets, we
compare our SEAT with other baseline methods using RNN, BiLSTM and BERT
architectures via six different evaluation metrics for model interpretation,
stability and accuracy. Results show that SEAT is more stable against different
perturbations and randomness while also keeps the explainability of attention,
which indicates it is a more faithful explanation. Moreover, compared with
vanilla attention, there is almost no utility (accuracy) degradation for SEAT.Comment: To be appeared in AAAI 202
Watermarking Classification Dataset for Copyright Protection
Substantial research works have shown that deep models, e.g., pre-trained
models, on the large corpus can learn universal language representations, which
are beneficial for downstream NLP tasks. However, these powerful models are
also vulnerable to various privacy attacks, while much sensitive information
exists in the training dataset. The attacker can easily steal sensitive
information from public models, e.g., individuals' email addresses and phone
numbers. In an attempt to address these issues, particularly the unauthorized
use of private data, we introduce a novel watermarking technique via a
backdoor-based membership inference approach named TextMarker, which can
safeguard diverse forms of private information embedded in the training text
data. Specifically, TextMarker only requires data owners to mark a small number
of samples for data copyright protection under the black-box access assumption
to the target model. Through extensive evaluation, we demonstrate the
effectiveness of TextMarker on various real-world datasets, e.g., marking only
0.1% of the training dataset is practically sufficient for effective membership
inference with negligible effect on model utility. We also discuss potential
countermeasures and show that TextMarker is stealthy enough to bypass them
Jailbreaking GPT-4V via Self-Adversarial Attacks with System Prompts
Existing work on jailbreak Multimodal Large Language Models (MLLMs) has
focused primarily on adversarial examples in model inputs, with less attention
to vulnerabilities, especially in model API. To fill the research gap, we carry
out the following work: 1) We discover a system prompt leakage vulnerability in
GPT-4V. Through carefully designed dialogue, we successfully extract the
internal system prompts of GPT-4V. This finding indicates potential exploitable
security risks in MLLMs; 2) Based on the acquired system prompts, we propose a
novel MLLM jailbreaking attack method termed SASP (Self-Adversarial Attack via
System Prompt). By employing GPT-4 as a red teaming tool against itself, we aim
to search for potential jailbreak prompts leveraging stolen system prompts.
Furthermore, in pursuit of better performance, we also add human modification
based on GPT-4's analysis, which further improves the attack success rate to
98.7\%; 3) We evaluated the effect of modifying system prompts to defend
against jailbreaking attacks. Results show that appropriately designed system
prompts can significantly reduce jailbreak success rates. Overall, our work
provides new insights into enhancing MLLM security, demonstrating the important
role of system prompts in jailbreaking. This finding could be leveraged to
greatly facilitate jailbreak success rates while also holding the potential for
defending against jailbreaks
- …