Search CORE

105 research outputs found

Recommended from our members

Dialogue Systems Specialized in Social Influence: Systems, Methods, and Ethics

Author: Shi Weiyan
Publication venue
Publication date: 01/01/2023
Field of study

This thesis concerns the task of how to develop dialogue systems specialized in social influence and problems around deploying such systems. Dialogue systems have become widely adopted in our daily life. Most dialogue systems are primarily focused on information-seeking tasks or social companionship. However, they cannot apply strategies in complex and critical social influence tasks, such as healthy habit promotion, emotional support, etc. In this work, we formally define social influence dialogue systems to be systems that influence users’ behaviors, feelings, thoughts, or opinions through natural conversations. We also present methods to make such systems intelligible, privacy-preserving, and thus deployable in real life. Finally, we acknowledge potential ethical issues around social influence systems and propose solutions to mitigate them in Chapter 6. Social influence dialogues span various domains, such as persuasion, negotiation, and recommendation. We first propose a donation persuasion task, PERSUASIONFORGOOD, and ground our study on this persuasion task for social good. We then build a persuasive dialogue system, by refining the dialogue model for intelligibility and imitating human experts for persuasiveness, and a negotiation agent that can play the game of Diplomacy by decoupling the planning engine and the dialogue generation module to improve controllability of social influence systems. To deploy such a system in the wild, our work examines how humans perceive the AI agent’s identity, and how their perceptions impact the social influence outcome. Moreover, dialogue models are trained on conversations, where people could share personal information. This creates privacy concerns for deployment as the models may memorize private information. To protect user privacy in the training data, our work develops privacy-preserving learning algorithms to ensure deployed models are safe under privacy attacks. Finally, deployed dialogue agents have the potential to integrate human feedback to continuously improve themselves. So we propose JUICER, a framework to make use of both binary and free-form textual human feedback to augment the training data and keep improving dialogue model performance after deployment. Building social influence dialogue systems enables us to research future expert-level AI systems that are accessible via natural languages, accountable with domain knowledge, and privacy-preserving with privacy guarantees

Columbia University Academic Commons

Just Fine-tune Twice: Selective Differential Privacy for Large Language Models

Author: Chen Si
Jia Ruoxi
Shi Weiyan
Yu Zhou
Zhang Chiyuan
Publication venue
Publication date: 15/04/2022
Field of study

With the increasing adoption of NLP models in real-world products, it becomes more and more important to protect these models from privacy leakage. Because private information in language data is sparse, previous research formalized a Selective-Differential-Privacy (SDP) notion to provide protection for sensitive tokens detected by policy functions, and prove its effectiveness on RNN-based models. But the previous mechanism requires separating the private and public model parameters and thus cannot be applied on large attention-based models. In this paper, we propose a simple yet effective just-fine-tune-twice privacy mechanism to first fine-tune on in-domain redacted data and then on in-domain private data, to achieve SDP for large Transformer-based language models. We also design explicit and contextual policy functions to provide protections at different levels. Experiments show that our models achieve strong performance while staying robust to the canary insertion attack. We further show that even under low-resource settings with a small amount of in-domain data, SDP can still improve the model utility. We will release the code, data and models to facilitate future research

arXiv.org e-Print Archive

Selective Differential Privacy for Language Modeling

Author: Cui Aiqi
Jia Ruoxi
Li Evan
Shi Weiyan
Yu Zhou
Publication venue
Publication date: 16/07/2022
Field of study

With the increasing applications of language models, it has become crucial to protect these models from leaking private information. Previous work has attempted to tackle this challenge by training RNN-based language models with differential privacy guarantees. However, applying classical differential privacy to language models leads to poor model performance as the underlying privacy notion is over-pessimistic and provides undifferentiated protection for all tokens in the data. Given that the private information in natural language is sparse (for example, the bulk of an email might not carry personally identifiable information), we propose a new privacy notion, selective differential privacy, to provide rigorous privacy guarantees on the sensitive portion of the data to improve model utility. To realize such a new notion, we develop a corresponding privacy mechanism, Selective-DPSGD, for RNN-based language models. Besides language modeling, we also apply the method to a more concrete application--dialog systems. Experiments on both language modeling and dialog system building show that the proposed privacy-preserving mechanism achieves better utilities while remaining safe under various privacy attacks compared to the baselines. The data and code are released at https://github.com/wyshi/lm_privacy to facilitate future research .Comment: NAACL 202

arXiv.org e-Print Archive

How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs

Author: Jia Ruoxi
Lin Hongpeng
Shi Weiyan
Yang Diyi
Zeng Yi
Zhang Jingwen
Publication venue
Publication date: 23/01/2024
Field of study

Most traditional AI safety research has approached AI models as machines and centered on algorithm-focused attacks developed by security experts. As large language models (LLMs) become increasingly common and competent, non-expert users can also impose risks during daily interactions. This paper introduces a new perspective to jailbreak LLMs as human-like communicators, to explore this overlooked intersection between everyday language interaction and AI safety. Specifically, we study how to persuade LLMs to jailbreak them. First, we propose a persuasion taxonomy derived from decades of social science research. Then, we apply the taxonomy to automatically generate interpretable persuasive adversarial prompts (PAP) to jailbreak LLMs. Results show that persuasion significantly increases the jailbreak performance across all risk categories: PAP consistently achieves an attack success rate of over

92\%

on Llama 2-7b Chat, GPT-3.5, and GPT-4 in

10

trials, surpassing recent algorithm-focused attacks. On the defense side, we explore various mechanisms against PAP and, found a significant gap in existing defenses, and advocate for more fundamental mitigation for highly interactive LLMsComment: 14 pages of the main text, qualitative examples of jailbreaks may be harmful in natur

arXiv.org e-Print Archive

Structured Attention for Unsupervised Dialogue Structure Induction

Author: Liang Yuan
Qiu Liang
Shi Feng
Shi Weiyan
Yu Zhou
Yuan Tao
Zhao Yizhou
Zhu Song-Chun
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

Inducing a meaningful structural representation from one or a set of dialogues is a crucial but challenging task in computational linguistics. Advancement made in this area is critical for dialogue system design and discourse analysis. It can also be extended to solve grammatical inference. In this work, we propose to incorporate structured attention layers into a Variational Recurrent Neural Network (VRNN) model with discrete latent states to learn dialogue structure in an unsupervised fashion. Compared to a vanilla VRNN, structured attention enables a model to focus on different parts of the source sentence embeddings while enforcing a structural inductive bias. Experiments show that on two-party dialogue datasets, VRNN with structured attention learns semantic structures that are similar to templates used to generate this dialogue corpus. While on multi-party dialogue datasets, our model learns an interactive structure demonstrating its capability of distinguishing speakers or addresses, automatically disentangling dialogues without explicit human annotation.Comment: Long paper accepted by EMNLP 202

arXiv.org e-Print Archive

Crossref