56 research outputs found
Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation
Large pre-trained language models achieve impressive results across many
tasks. However, recent works point out that pre-trained language models may
memorize a considerable fraction of their training data, leading to the privacy
risk of information leakage. In this paper, we propose a method named Ethicist
for targeted training data extraction through loss smoothed soft prompting and
calibrated confidence estimation, investigating how to recover the suffix in
the training data when given a prefix. To elicit memorization in the attacked
model, we tune soft prompt embeddings while keeping the model fixed. We further
propose a smoothing loss that smooths the loss distribution of the suffix
tokens to make it easier to sample the correct suffix. In order to select the
most probable suffix from a collection of sampled suffixes and estimate the
prediction confidence, we propose a calibrated confidence estimation method,
which normalizes the confidence of the generated suffixes with a local
estimation. We show that Ethicist significantly improves the extraction
performance on a recently proposed public benchmark. We also investigate
several factors influencing the data extraction performance, including decoding
strategy, model scale, prefix length, and suffix length. Our code is available
at https://github.com/thu-coai/Targeted-Data-Extraction.Comment: ACL 2023 Long Paper (Main Conference
Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs
Binary code analysis allows analyzing binary code without having access to
the corresponding source code. A binary, after disassembly, is expressed in an
assembly language. This inspires us to approach binary analysis by leveraging
ideas and techniques from Natural Language Processing (NLP), a rich area
focused on processing text of various natural languages. We notice that binary
code analysis and NLP share a lot of analogical topics, such as semantics
extraction, summarization, and classification. This work utilizes these ideas
to address two important code similarity comparison problems. (I) Given a pair
of basic blocks for different instruction set architectures (ISAs), determining
whether their semantics is similar or not; and (II) given a piece of code of
interest, determining if it is contained in another piece of assembly code for
a different ISA. The solutions to these two problems have many applications,
such as cross-architecture vulnerability discovery and code plagiarism
detection. We implement a prototype system INNEREYE and perform a comprehensive
evaluation. A comparison between our approach and existing approaches to
Problem I shows that our system outperforms them in terms of accuracy,
efficiency and scalability. And the case studies utilizing the system
demonstrate that our solution to Problem II is effective. Moreover, this
research showcases how to apply ideas and techniques from NLP to large-scale
binary code analysis.Comment: Accepted by Network and Distributed Systems Security (NDSS) Symposium
201
Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization
Large Language Models (LLMs) continue to advance in their capabilities, yet
this progress is accompanied by a growing array of safety risks. While
significant attention has been dedicated to exploiting weaknesses in LLMs
through jailbreaking attacks, there remains a paucity of exploration into
defending against these attacks. We point out a pivotal factor contributing to
the success of jailbreaks: the inherent conflict between the goals of being
helpful and ensuring safety. To counter jailbreaking attacks, we propose to
integrate goal prioritization at both training and inference stages.
Implementing goal prioritization during inference substantially diminishes the
Attack Success Rate (ASR) of jailbreaking attacks, reducing it from 66.4% to
2.0% for ChatGPT and from 68.2% to 19.4% for Vicuna-33B, without compromising
general performance. Furthermore, integrating the concept of goal
prioritization into the training phase reduces the ASR from 71.0% to 6.6% for
LLama2-13B. Remarkably, even in scenarios where no jailbreaking samples are
included during training, our approach slashes the ASR by half, decreasing it
from 71.0% to 34.0%. Additionally, our findings reveal that while stronger LLMs
face greater safety risks, they also possess a greater capacity to be steered
towards defending against such attacks. We hope our work could contribute to
the comprehension of jailbreaking attacks and defenses, and shed light on the
relationship between LLMs' capability and safety. Our code will be available at
\url{https://github.com/thu-coai/JailbreakDefense_GoalPriority}.Comment: 14 page
Selecting Stickers in Open-Domain Dialogue through Multitask Learning
With the increasing popularity of online chatting, stickers are becoming
important in our online communication. Selecting appropriate stickers in
open-domain dialogue requires a comprehensive understanding of both dialogues
and stickers, as well as the relationship between the two types of modalities.
To tackle these challenges, we propose a multitask learning method comprised of
three auxiliary tasks to enhance the understanding of dialogue history, emotion
and semantic meaning of stickers. Extensive experiments conducted on a recent
challenging dataset show that our model can better combine the multimodal
information and achieve significantly higher accuracy over strong baselines.
Ablation study further verifies the effectiveness of each auxiliary task. Our
code is available at \url{https://github.com/nonstopfor/Sticker-Selection}Comment: ACL 2022 findings, camera-read
Towards Safer Generative Language Models: A Survey on Safety Risks, Evaluations, and Improvements
As generative large model capabilities advance, safety concerns become more
pronounced in their outputs. To ensure the sustainable growth of the AI
ecosystem, it's imperative to undertake a holistic evaluation and refinement of
associated safety risks. This survey presents a framework for safety research
pertaining to large models, delineating the landscape of safety risks as well
as safety evaluation and improvement methods. We begin by introducing safety
issues of wide concern, then delve into safety evaluation methods for large
models, encompassing preference-based testing, adversarial attack approaches,
issues detection, and other advanced evaluation methods. Additionally, we
explore the strategies for enhancing large model safety from training to
deployment, highlighting cutting-edge safety approaches for each stage in
building large models. Finally, we discuss the core challenges in advancing
towards more responsible AI, including the interpretability of safety
mechanisms, ongoing safety issues, and robustness against malicious attacks.
Through this survey, we aim to provide clear technical guidance for safety
researchers and encourage further study on the safety of large models
Unveiling the Implicit Toxicity in Large Language Models
The open-endedness of large language models (LLMs) combined with their
impressive capabilities may lead to new safety issues when being exploited for
malicious use. While recent studies primarily focus on probing toxic outputs
that can be easily detected with existing toxicity classifiers, we show that
LLMs can generate diverse implicit toxic outputs that are exceptionally
difficult to detect via simply zero-shot prompting. Moreover, we propose a
reinforcement learning (RL) based attacking method to further induce the
implicit toxicity in LLMs. Specifically, we optimize the language model with a
reward that prefers implicit toxic outputs to explicit toxic and non-toxic
ones. Experiments on five widely-adopted toxicity classifiers demonstrate that
the attack success rate can be significantly improved through RL fine-tuning.
For instance, the RL-finetuned LLaMA-13B model achieves an attack success rate
of 90.04% on BAD and 62.85% on Davinci003. Our findings suggest that LLMs pose
a significant threat in generating undetectable implicit toxic outputs. We
further show that fine-tuning toxicity classifiers on the annotated examples
from our attacking method can effectively enhance their ability to detect
LLM-generated implicit toxic language. The code is publicly available at
https://github.com/thu-coai/Implicit-Toxicity.Comment: EMNLP 2023 Main Conferenc
Bile dynamics within the biliary tract and microfluidic-based bile component detection: A review
Bilestones are solid masses found in the gallbladder or biliary tract, which
block the normal bile flow and eventually result in severe life-threatening
complications. Studies have shown that bilestone formation may be related to
bile flow dynamics and the concentration level of bile components. The bile
flow dynamics in the biliary tract play a critical role in disclosing the
mechanism of bile stasis and transportation. The concentration of bile
composition is closely associated with processes such as nucleation and
crystallization. Recently, microfluidic-based biosensors have been favored for
multiple advantages over traditional bench-top detection assays for their less
sample consumption, portability, low cost, and high sensitivity for real-time
detection. Here, we reviewed the developments in bile dynamics study and
microfluidics-based bile component detection methods. These studies may provide
valuable insights into the bilestone formation mechanisms and better treatment,
alongside our opinions on the future development of in vitro lithotriptic drug
screening of bilestones and bile characterization tests
Effects of bagging treatment on fruit quality and pesticide residues of ‘Donghong’ kiwifruit
In this study, the effects of bagging treatment on fruit appearance and internal quality, including fruit shape index, single fruit weight, fruit vitamin C, soluble solids, titratable acid, chlorophyll and carotenoid content, were studied by using ‘Donghong’ kiwifruit as experimental material. Meanwhile, the pesticide residues of bagged and non-bagged (NB) kiwifruit were compared and analyzed. Results showed that the pericarp color of bagged kiwifruit is lighter in green than that of NB group, with uniform color, cleaner appearance and no scabs and spots. Single fruit weight and transverse meridians of bagged kiwifruits were significantly increased. Interestingly, we found single bagging treatment had little effect on the internal quality of ‘Donghong’ kiwifruit. After pesticide treatment, the fruit shape of kiwifruit was rounder, significantly increased the weight, chlorophyll content of fruit, and decreased the content of vitamin C and carotenoids. In addition, bagging + pesticide treatment had additive effect on these characters. After bagging, the residues of pyraclostrobin and β-cypermethrin in ‘Donghong’ kiwifruit were significantly lower than those in the control. However, the residue of difenoconazole residues did not show significant difference. Bagging treatment has the ability to improve the appearance and reduce the residue of some pesticides. Meanwhile, the application of some pesticides can improve the single fruit weight and fruit quality. Therefore, bagging is a scientific and effective cultivation method in the green production of ‘Donghong’ kiwifruit
SafetyBench: Evaluating the Safety of Large Language Models with Multiple Choice Questions
With the rapid development of Large Language Models (LLMs), increasing
attention has been paid to their safety concerns. Consequently, evaluating the
safety of LLMs has become an essential task for facilitating the broad
applications of LLMs. Nevertheless, the absence of comprehensive safety
evaluation benchmarks poses a significant impediment to effectively assess and
enhance the safety of LLMs. In this work, we present SafetyBench, a
comprehensive benchmark for evaluating the safety of LLMs, which comprises
11,435 diverse multiple choice questions spanning across 7 distinct categories
of safety concerns. Notably, SafetyBench also incorporates both Chinese and
English data, facilitating the evaluation in both languages. Our extensive
tests over 25 popular Chinese and English LLMs in both zero-shot and few-shot
settings reveal a substantial performance advantage for GPT-4 over its
counterparts, and there is still significant room for improving the safety of
current LLMs. We believe SafetyBench will enable fast and comprehensive
evaluation of LLMs' safety, and foster the development of safer LLMs. Data and
evaluation guidelines are available at https://github.com/thu-coai/SafetyBench.
Submission entrance and leaderboard are available at
https://llmbench.ai/safety.Comment: 15 page
Propionate restores disturbed gut microbiota induced by methotrexate in Rheumatoid Arthritis: From clinic to experiments
peer-reviewedObjectiveGut toxicity of methotrexate restricts its long-term clinical application in patients with rheumatoid arthritis (RA). However, the disturbed gut microbiota especially beneficial bacteria induced by methotrexate remain unclear, and the way to alleviate the disturbances is limited. The aim of present study was to elucidate the effects of methotrexate on gut microbiota in RA patients and collagen-induced-arthritis (CIA) rats, and to explore a way targeted for restoring gut microbiota. MethodsThe gut microbiota including lactobacilli and bifidobacterial communities were analyzed by high-throughput sequencing of samples from RA patients and healthy controls (HCs), furthermore, propionate and butyrate were administrated to CIA rats to confirm the effects on gut microbiota. ResultsThe findings showed gut microbiota significantly altered, and bifidobacterial community was tolerant to methotrexate while lactobacilli community such as L. delbrueckii, L. manihotivorans and L. intestinalis species under-represented in RA samples. Propionate supplementation was significantly associated with the rebalance of gut microbiota especially lactobacilli species including L. acidophilus, L. intestinalis and L. amylovorus in CIA rats. ConclusionMethotrexate disturbed gut microbiota and lactobacilli community in RA patients, and propionate supplementation contributed to normalize lactobacilli community in CIA rats. These findings suggested that propionate may be a potential alleviator for gut microbiota in RA patients treated with methotrexate
- …