56 research outputs found

    Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation

    Full text link
    Large pre-trained language models achieve impressive results across many tasks. However, recent works point out that pre-trained language models may memorize a considerable fraction of their training data, leading to the privacy risk of information leakage. In this paper, we propose a method named Ethicist for targeted training data extraction through loss smoothed soft prompting and calibrated confidence estimation, investigating how to recover the suffix in the training data when given a prefix. To elicit memorization in the attacked model, we tune soft prompt embeddings while keeping the model fixed. We further propose a smoothing loss that smooths the loss distribution of the suffix tokens to make it easier to sample the correct suffix. In order to select the most probable suffix from a collection of sampled suffixes and estimate the prediction confidence, we propose a calibrated confidence estimation method, which normalizes the confidence of the generated suffixes with a local estimation. We show that Ethicist significantly improves the extraction performance on a recently proposed public benchmark. We also investigate several factors influencing the data extraction performance, including decoding strategy, model scale, prefix length, and suffix length. Our code is available at https://github.com/thu-coai/Targeted-Data-Extraction.Comment: ACL 2023 Long Paper (Main Conference

    Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs

    Full text link
    Binary code analysis allows analyzing binary code without having access to the corresponding source code. A binary, after disassembly, is expressed in an assembly language. This inspires us to approach binary analysis by leveraging ideas and techniques from Natural Language Processing (NLP), a rich area focused on processing text of various natural languages. We notice that binary code analysis and NLP share a lot of analogical topics, such as semantics extraction, summarization, and classification. This work utilizes these ideas to address two important code similarity comparison problems. (I) Given a pair of basic blocks for different instruction set architectures (ISAs), determining whether their semantics is similar or not; and (II) given a piece of code of interest, determining if it is contained in another piece of assembly code for a different ISA. The solutions to these two problems have many applications, such as cross-architecture vulnerability discovery and code plagiarism detection. We implement a prototype system INNEREYE and perform a comprehensive evaluation. A comparison between our approach and existing approaches to Problem I shows that our system outperforms them in terms of accuracy, efficiency and scalability. And the case studies utilizing the system demonstrate that our solution to Problem II is effective. Moreover, this research showcases how to apply ideas and techniques from NLP to large-scale binary code analysis.Comment: Accepted by Network and Distributed Systems Security (NDSS) Symposium 201

    Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization

    Full text link
    Large Language Models (LLMs) continue to advance in their capabilities, yet this progress is accompanied by a growing array of safety risks. While significant attention has been dedicated to exploiting weaknesses in LLMs through jailbreaking attacks, there remains a paucity of exploration into defending against these attacks. We point out a pivotal factor contributing to the success of jailbreaks: the inherent conflict between the goals of being helpful and ensuring safety. To counter jailbreaking attacks, we propose to integrate goal prioritization at both training and inference stages. Implementing goal prioritization during inference substantially diminishes the Attack Success Rate (ASR) of jailbreaking attacks, reducing it from 66.4% to 2.0% for ChatGPT and from 68.2% to 19.4% for Vicuna-33B, without compromising general performance. Furthermore, integrating the concept of goal prioritization into the training phase reduces the ASR from 71.0% to 6.6% for LLama2-13B. Remarkably, even in scenarios where no jailbreaking samples are included during training, our approach slashes the ASR by half, decreasing it from 71.0% to 34.0%. Additionally, our findings reveal that while stronger LLMs face greater safety risks, they also possess a greater capacity to be steered towards defending against such attacks. We hope our work could contribute to the comprehension of jailbreaking attacks and defenses, and shed light on the relationship between LLMs' capability and safety. Our code will be available at \url{https://github.com/thu-coai/JailbreakDefense_GoalPriority}.Comment: 14 page

    Selecting Stickers in Open-Domain Dialogue through Multitask Learning

    Full text link
    With the increasing popularity of online chatting, stickers are becoming important in our online communication. Selecting appropriate stickers in open-domain dialogue requires a comprehensive understanding of both dialogues and stickers, as well as the relationship between the two types of modalities. To tackle these challenges, we propose a multitask learning method comprised of three auxiliary tasks to enhance the understanding of dialogue history, emotion and semantic meaning of stickers. Extensive experiments conducted on a recent challenging dataset show that our model can better combine the multimodal information and achieve significantly higher accuracy over strong baselines. Ablation study further verifies the effectiveness of each auxiliary task. Our code is available at \url{https://github.com/nonstopfor/Sticker-Selection}Comment: ACL 2022 findings, camera-read

    Towards Safer Generative Language Models: A Survey on Safety Risks, Evaluations, and Improvements

    Full text link
    As generative large model capabilities advance, safety concerns become more pronounced in their outputs. To ensure the sustainable growth of the AI ecosystem, it's imperative to undertake a holistic evaluation and refinement of associated safety risks. This survey presents a framework for safety research pertaining to large models, delineating the landscape of safety risks as well as safety evaluation and improvement methods. We begin by introducing safety issues of wide concern, then delve into safety evaluation methods for large models, encompassing preference-based testing, adversarial attack approaches, issues detection, and other advanced evaluation methods. Additionally, we explore the strategies for enhancing large model safety from training to deployment, highlighting cutting-edge safety approaches for each stage in building large models. Finally, we discuss the core challenges in advancing towards more responsible AI, including the interpretability of safety mechanisms, ongoing safety issues, and robustness against malicious attacks. Through this survey, we aim to provide clear technical guidance for safety researchers and encourage further study on the safety of large models

    Unveiling the Implicit Toxicity in Large Language Models

    Full text link
    The open-endedness of large language models (LLMs) combined with their impressive capabilities may lead to new safety issues when being exploited for malicious use. While recent studies primarily focus on probing toxic outputs that can be easily detected with existing toxicity classifiers, we show that LLMs can generate diverse implicit toxic outputs that are exceptionally difficult to detect via simply zero-shot prompting. Moreover, we propose a reinforcement learning (RL) based attacking method to further induce the implicit toxicity in LLMs. Specifically, we optimize the language model with a reward that prefers implicit toxic outputs to explicit toxic and non-toxic ones. Experiments on five widely-adopted toxicity classifiers demonstrate that the attack success rate can be significantly improved through RL fine-tuning. For instance, the RL-finetuned LLaMA-13B model achieves an attack success rate of 90.04% on BAD and 62.85% on Davinci003. Our findings suggest that LLMs pose a significant threat in generating undetectable implicit toxic outputs. We further show that fine-tuning toxicity classifiers on the annotated examples from our attacking method can effectively enhance their ability to detect LLM-generated implicit toxic language. The code is publicly available at https://github.com/thu-coai/Implicit-Toxicity.Comment: EMNLP 2023 Main Conferenc

    Bile dynamics within the biliary tract and microfluidic-based bile component detection: A review

    Full text link
    Bilestones are solid masses found in the gallbladder or biliary tract, which block the normal bile flow and eventually result in severe life-threatening complications. Studies have shown that bilestone formation may be related to bile flow dynamics and the concentration level of bile components. The bile flow dynamics in the biliary tract play a critical role in disclosing the mechanism of bile stasis and transportation. The concentration of bile composition is closely associated with processes such as nucleation and crystallization. Recently, microfluidic-based biosensors have been favored for multiple advantages over traditional bench-top detection assays for their less sample consumption, portability, low cost, and high sensitivity for real-time detection. Here, we reviewed the developments in bile dynamics study and microfluidics-based bile component detection methods. These studies may provide valuable insights into the bilestone formation mechanisms and better treatment, alongside our opinions on the future development of in vitro lithotriptic drug screening of bilestones and bile characterization tests

    Effects of bagging treatment on fruit quality and pesticide residues of ‘Donghong’ kiwifruit

    Get PDF
    In this study, the effects of bagging treatment on fruit appearance and internal quality, including fruit shape index, single fruit weight, fruit vitamin C, soluble solids, titratable acid, chlorophyll and carotenoid content, were studied by using ‘Donghong’ kiwifruit as experimental material. Meanwhile, the pesticide residues of bagged and non-bagged (NB) kiwifruit were compared and analyzed. Results showed that the pericarp color of bagged kiwifruit is lighter in green than that of NB group, with uniform color, cleaner appearance and no scabs and spots. Single fruit weight and transverse meridians of bagged kiwifruits were significantly increased. Interestingly, we found single bagging treatment had little effect on the internal quality of ‘Donghong’ kiwifruit. After pesticide treatment, the fruit shape of kiwifruit was rounder, significantly increased the weight, chlorophyll content of fruit, and decreased the content of vitamin C and carotenoids. In addition, bagging + pesticide treatment had additive effect on these characters. After bagging, the residues of pyraclostrobin and β-cypermethrin in ‘Donghong’ kiwifruit were significantly lower than those in the control. However, the residue of difenoconazole residues did not show significant difference. Bagging treatment has the ability to improve the appearance and reduce the residue of some pesticides. Meanwhile, the application of some pesticides can improve the single fruit weight and fruit quality. Therefore, bagging is a scientific and effective cultivation method in the green production of ‘Donghong’ kiwifruit

    SafetyBench: Evaluating the Safety of Large Language Models with Multiple Choice Questions

    Full text link
    With the rapid development of Large Language Models (LLMs), increasing attention has been paid to their safety concerns. Consequently, evaluating the safety of LLMs has become an essential task for facilitating the broad applications of LLMs. Nevertheless, the absence of comprehensive safety evaluation benchmarks poses a significant impediment to effectively assess and enhance the safety of LLMs. In this work, we present SafetyBench, a comprehensive benchmark for evaluating the safety of LLMs, which comprises 11,435 diverse multiple choice questions spanning across 7 distinct categories of safety concerns. Notably, SafetyBench also incorporates both Chinese and English data, facilitating the evaluation in both languages. Our extensive tests over 25 popular Chinese and English LLMs in both zero-shot and few-shot settings reveal a substantial performance advantage for GPT-4 over its counterparts, and there is still significant room for improving the safety of current LLMs. We believe SafetyBench will enable fast and comprehensive evaluation of LLMs' safety, and foster the development of safer LLMs. Data and evaluation guidelines are available at https://github.com/thu-coai/SafetyBench. Submission entrance and leaderboard are available at https://llmbench.ai/safety.Comment: 15 page

    Propionate restores disturbed gut microbiota induced by methotrexate in Rheumatoid Arthritis: From clinic to experiments

    Get PDF
    peer-reviewedObjectiveGut toxicity of methotrexate restricts its long-term clinical application in patients with rheumatoid arthritis (RA). However, the disturbed gut microbiota especially beneficial bacteria induced by methotrexate remain unclear, and the way to alleviate the disturbances is limited. The aim of present study was to elucidate the effects of methotrexate on gut microbiota in RA patients and collagen-induced-arthritis (CIA) rats, and to explore a way targeted for restoring gut microbiota. MethodsThe gut microbiota including lactobacilli and bifidobacterial communities were analyzed by high-throughput sequencing of samples from RA patients and healthy controls (HCs), furthermore, propionate and butyrate were administrated to CIA rats to confirm the effects on gut microbiota. ResultsThe findings showed gut microbiota significantly altered, and bifidobacterial community was tolerant to methotrexate while lactobacilli community such as L. delbrueckii, L. manihotivorans and L. intestinalis species under-represented in RA samples. Propionate supplementation was significantly associated with the rebalance of gut microbiota especially lactobacilli species including L. acidophilus, L. intestinalis and L. amylovorus in CIA rats. ConclusionMethotrexate disturbed gut microbiota and lactobacilli community in RA patients, and propionate supplementation contributed to normalize lactobacilli community in CIA rats. These findings suggested that propionate may be a potential alleviator for gut microbiota in RA patients treated with methotrexate
    • …
    corecore