Search CORE

29 research outputs found

Make Them Spill the Beans! Coercive Knowledge Extraction from (Production) LLMs

Author: Cheng Siyuan
Shen Guangyu
Tao Guanhong
Zhang Xiangyu
Zhang Zhuo
Publication venue
Publication date: 07/12/2023
Field of study

Large Language Models (LLMs) are now widely used in various applications, making it crucial to align their ethical standards with human values. However, recent jail-breaking methods demonstrate that this alignment can be undermined using carefully constructed prompts. In our study, we reveal a new threat to LLM alignment when a bad actor has access to the model's output logits, a common feature in both open-source LLMs and many commercial LLM APIs (e.g., certain GPT models). It does not rely on crafting specific prompts. Instead, it exploits the fact that even when an LLM rejects a toxic request, a harmful response often hides deep in the output logits. By forcefully selecting lower-ranked output tokens during the auto-regressive generation process at a few critical output positions, we can compel the model to reveal these hidden responses. We term this process model interrogation. This approach differs from and outperforms jail-breaking methods, achieving 92% effectiveness compared to 62%, and is 10 to 20 times faster. The harmful content uncovered through our method is more relevant, complete, and clear. Additionally, it can complement jail-breaking strategies, with which results in further boosting attack performance. Our findings indicate that interrogation can extract toxic knowledge even from models specifically designed for coding tasks

arXiv.org e-Print Archive

Teaching Young Learners Computational Thinking

Author: An Shengwei
Du Hengrong
Li Tingxuan
Tao Guanhong
Wang Xuan
Publication venue: 'Purdue University (bepress)'
Publication date: 04/03/2019
Field of study

Purdue E-Pubs

Opening A Pandora's Box: Things You Should Know in the Era of Custom GPTs

Author: Cheng Siyuan
Shen Guangyu
Tao Guanhong
Zhang Xiangyu
Zhang Zhuo
Zhu Junmin
Publication venue
Publication date: 31/12/2023
Field of study

The emergence of large language models (LLMs) has significantly accelerated the development of a wide range of applications across various fields. There is a growing trend in the construction of specialized platforms based on LLMs, such as the newly introduced custom GPTs by OpenAI. While custom GPTs provide various functionalities like web browsing and code execution, they also introduce significant security threats. In this paper, we conduct a comprehensive analysis of the security and privacy issues arising from the custom GPT platform. Our systematic examination categorizes potential attack scenarios into three threat models based on the role of the malicious actor, and identifies critical data exchange channels in custom GPTs. Utilizing the STRIDE threat modeling framework, we identify 26 potential attack vectors, with 19 being partially or fully validated in real-world settings. Our findings emphasize the urgent need for robust security and privacy measures in the custom GPT ecosystem, especially in light of the forthcoming launch of the official GPT store by OpenAI

arXiv.org e-Print Archive

Backdooring Neural Code Search

Author: Chen Yuchen
Fang Chunrong
Luo Bin
Sun Weisong
Tao Guanhong
Zhang Quanjun
Zhang Xiangyu
Publication venue
Publication date: 27/05/2023
Field of study

Reusing off-the-shelf code snippets from online repositories is a common practice, which significantly enhances the productivity of software developers. To find desired code snippets, developers resort to code search engines through natural language queries. Neural code search models are hence behind many such engines. These models are based on deep learning and gain substantial attention due to their impressive performance. However, the security aspect of these models is rarely studied. Particularly, an adversary can inject a backdoor in neural code search models, which return buggy or even vulnerable code with security/privacy issues. This may impact the downstream software (e.g., stock trading systems and autonomous driving) and cause financial loss and/or life-threatening incidents. In this paper, we demonstrate such attacks are feasible and can be quite stealthy. By simply modifying one variable/function name, the attacker can make buggy/vulnerable code rank in the top 11%. Our attack BADCODE features a special trigger generation and injection procedure, making the attack more effective and stealthy. The evaluation is conducted on two neural code search models and the results show our attack outperforms baselines by 60%. Our user study demonstrates that our attack is more stealthy than the baseline by two times based on the F1 score

arXiv.org e-Print Archive

Fusion is Not Enough: Single Modal Attacks on Fusion Models for 3D Object Detection

Author: Cheng Zhiyuan
Choi Hongjun
Feng Shiwei
Liang James
Liu Dongfang
Tao Guanhong
Zhang Xiangyu
Zuzak Michael
Publication venue
Publication date: 02/03/2024
Field of study

Multi-sensor fusion (MSF) is widely used in autonomous vehicles (AVs) for perception, particularly for 3D object detection with camera and LiDAR sensors. The purpose of fusion is to capitalize on the advantages of each modality while minimizing its weaknesses. Advanced deep neural network (DNN)-based fusion techniques have demonstrated the exceptional and industry-leading performance. Due to the redundant information in multiple modalities, MSF is also recognized as a general defence strategy against adversarial attacks. In this paper, we attack fusion models from the camera modality that is considered to be of lesser importance in fusion but is more affordable for attackers. We argue that the weakest link of fusion models depends on their most vulnerable modality, and propose an attack framework that targets advanced camera-LiDAR fusion-based 3D object detection models through camera-only adversarial attacks. Our approach employs a two-stage optimization-based strategy that first thoroughly evaluates vulnerable image areas under adversarial attacks, and then applies dedicated attack strategies for different fusion models to generate deployable patches. The evaluations with six advanced camera-LiDAR fusion models and one camera-only model indicate that our attacks successfully compromise all of them. Our approach can either decrease the mean average precision (mAP) of detection performance from 0.824 to 0.353, or degrade the detection score of a target object from 0.728 to 0.156, demonstrating the efficacy of our proposed attack framework. Code is available.Comment: Accepted at ICLR'202

arXiv.org e-Print Archive

Detecting Backdoors in Pre-trained Encoders

Author: Cheng Siyuan
Feng Shiwei
Liu Yingqi
Ma Shiqing
Shen Guangyu
Tao Guanhong
Xu Xiangzhe
Zhang Kaiyuan
Zhang Xiangyu
Publication venue
Publication date: 23/03/2023
Field of study

Self-supervised learning in computer vision trains on unlabeled data, such as images or (image, text) pairs, to obtain an image encoder that learns high-quality embeddings for input data. Emerging backdoor attacks towards encoders expose crucial vulnerabilities of self-supervised learning, since downstream classifiers (even further trained on clean data) may inherit backdoor behaviors from encoders. Existing backdoor detection methods mainly focus on supervised learning settings and cannot handle pre-trained encoders especially when input labels are not available. In this paper, we propose DECREE, the first backdoor detection approach for pre-trained encoders, requiring neither classifier headers nor input labels. We evaluate DECREE on over 400 encoders trojaned under 3 paradigms. We show the effectiveness of our method on image encoders pre-trained on ImageNet and OpenAI's CLIP 400 million image-text pairs. Our method consistently has a high detection accuracy even if we have only limited or no access to the pre-training dataset.Comment: Accepted at CVPR 2023. Code is available at https://github.com/GiantSeaweed/DECRE

arXiv.org e-Print Archive

Elijah: Eliminating Backdoors Injected in Diffusion Models via Distribution Shift

Author: An Shengwei
Chen Pin-Yu
Cheng Siyuan
Chou Sheng-Yen
Ho Tsung-Yi
Ma Shiqing
Shen Guangyu
Tao Guanhong
Xu Qiuling
Zhang Kaiyuan
Zhang Xiangyu
Publication venue
Publication date: 04/02/2024
Field of study

Diffusion models (DM) have become state-of-the-art generative models because of their capability to generate high-quality images from noises without adversarial training. However, they are vulnerable to backdoor attacks as reported by recent studies. When a data input (e.g., some Gaussian noise) is stamped with a trigger (e.g., a white patch), the backdoored model always generates the target image (e.g., an improper photo). However, effective defense strategies to mitigate backdoors from DMs are underexplored. To bridge this gap, we propose the first backdoor detection and removal framework for DMs. We evaluate our framework Elijah on hundreds of DMs of 3 types including DDPM, NCSN and LDM, with 13 samplers against 3 existing backdoor attacks. Extensive experiments show that our approach can have close to 100% detection accuracy and reduce the backdoor effects to close to zero without significantly sacrificing the model utility.Comment: AAAI 202

arXiv.org e-Print Archive

長岡雪氷防災実験研究所における雪氷コア研究への取組み

Author: Guanhong Tao
小島隆志
山田隆二
東浦將夫
神田尚子
Publication venue: 防災科学技術研究所
Publication date: 01/08/1999
Field of study

NRInstESDR Repository (National Research Institute for Earth Science and Disaster Resilience) / 防災科研機関リポジトリ

The Efficacy and Safety of Shen Guo Lao Nian Granule for Common Cold of Qi-Deficiency Syndrome: Study Protocol for a Randomized, Double-Blind, Placebo-Controlled, Multicenter, Phase II Clinical Trial

Author: Bin She
Bing Mao
Guanhong Li
Haimiao Yang
Hong Ding
Hongli Jiang
Juanjuan Fu
Ruiming Zhang
Siyuan Hu
Tao Fan
Wei Liu
Xuemei Liu
Ying Lan
Yuhong Huang
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2017
Field of study

Background. Common cold is one of the most frequently occurring illnesses in primary healthcare services and represents considerable disease burden. Common cold of Qi-deficiency syndrome (CCQDS) is an important but less addressed traditional Chinese medicine (TCM) pattern. We designed a protocol to explore the efficacy, safety, and optimal dose of Shen Guo Lao Nian Granule (SGLNG) for treating CCQDS. Methods/Design. This is a multicenter, randomized, double-blind, placebo-controlled, phase II clinical trial. A total of 240 eligible patients will be recruited from five centers. Patients are randomly assigned to high-dose group, middle-dose group, low-dose group, or control group in a 1 : 1 : 1 : 1 ratio. All drugs are required to be taken 3 times daily for 5 days with a 5-day follow-up period. Primary outcomes are duration of all symptoms, total score reduction on Jackson’s scale, and TCM symptoms scale. Secondary outcomes include every single TCM symptom duration and score reduction, TCM main symptoms disappearance rate, curative effects, and comparison between Jackson’s scale and TCM symptom scale. Ethics and Trial Registration. This study protocol was approved by the Ethics Committee of Clinical Trials and Biomedicine of West China Hospital of Sichuan University (number IRB-2014-12) and registered with the Chinese Clinical Trial Registry (ChiCTR-IPR-15006349)

Crossref

Directory of Open Access Journals