105 research outputs found
Recommended from our members
Dialogue Systems Specialized in Social Influence: Systems, Methods, and Ethics
This thesis concerns the task of how to develop dialogue systems specialized in social influence and problems around deploying such systems. Dialogue systems have become widely adopted in our daily life. Most dialogue systems are primarily focused on information-seeking tasks or social companionship. However, they cannot apply strategies in complex and critical social influence tasks, such as healthy habit promotion, emotional support, etc. In this work, we formally define social influence dialogue systems to be systems that influence users’ behaviors, feelings, thoughts, or opinions through natural conversations. We also present methods to make such systems intelligible, privacy-preserving, and thus deployable in real life. Finally, we acknowledge potential ethical issues around social influence systems and propose solutions to mitigate them in Chapter 6.
Social influence dialogues span various domains, such as persuasion, negotiation, and recommendation. We first propose a donation persuasion task, PERSUASIONFORGOOD, and ground our study on this persuasion task for social good. We then build a persuasive dialogue system, by refining the dialogue model for intelligibility and imitating human experts for persuasiveness, and a negotiation agent that can play the game of Diplomacy by decoupling the planning engine and the dialogue generation module to improve controllability of social influence systems. To deploy such a system in the wild, our work examines how humans perceive the AI agent’s identity, and how their perceptions impact the social influence outcome. Moreover, dialogue models are trained on conversations, where people could share personal information. This creates privacy concerns for deployment as the models may memorize private information.
To protect user privacy in the training data, our work develops privacy-preserving learning algorithms to ensure deployed models are safe under privacy attacks. Finally, deployed dialogue agents have the potential to integrate human feedback to continuously improve themselves. So we propose JUICER, a framework to make use of both binary and free-form textual human feedback to augment the training data and keep improving dialogue model performance after deployment. Building social influence dialogue systems enables us to research future expert-level AI systems that are accessible via natural languages, accountable with domain knowledge, and privacy-preserving with privacy guarantees
Just Fine-tune Twice: Selective Differential Privacy for Large Language Models
With the increasing adoption of NLP models in real-world products, it becomes
more and more important to protect these models from privacy leakage. Because
private information in language data is sparse, previous research formalized a
Selective-Differential-Privacy (SDP) notion to provide protection for sensitive
tokens detected by policy functions, and prove its effectiveness on RNN-based
models. But the previous mechanism requires separating the private and public
model parameters and thus cannot be applied on large attention-based models. In
this paper, we propose a simple yet effective just-fine-tune-twice privacy
mechanism to first fine-tune on in-domain redacted data and then on in-domain
private data, to achieve SDP for large Transformer-based language models. We
also design explicit and contextual policy functions to provide protections at
different levels. Experiments show that our models achieve strong performance
while staying robust to the canary insertion attack. We further show that even
under low-resource settings with a small amount of in-domain data, SDP can
still improve the model utility. We will release the code, data and models to
facilitate future research
Selective Differential Privacy for Language Modeling
With the increasing applications of language models, it has become crucial to
protect these models from leaking private information. Previous work has
attempted to tackle this challenge by training RNN-based language models with
differential privacy guarantees. However, applying classical differential
privacy to language models leads to poor model performance as the underlying
privacy notion is over-pessimistic and provides undifferentiated protection for
all tokens in the data. Given that the private information in natural language
is sparse (for example, the bulk of an email might not carry personally
identifiable information), we propose a new privacy notion, selective
differential privacy, to provide rigorous privacy guarantees on the sensitive
portion of the data to improve model utility. To realize such a new notion, we
develop a corresponding privacy mechanism, Selective-DPSGD, for RNN-based
language models. Besides language modeling, we also apply the method to a more
concrete application--dialog systems. Experiments on both language modeling and
dialog system building show that the proposed privacy-preserving mechanism
achieves better utilities while remaining safe under various privacy attacks
compared to the baselines. The data and code are released at
https://github.com/wyshi/lm_privacy to facilitate future research .Comment: NAACL 202
How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs
Most traditional AI safety research has approached AI models as machines and
centered on algorithm-focused attacks developed by security experts. As large
language models (LLMs) become increasingly common and competent, non-expert
users can also impose risks during daily interactions. This paper introduces a
new perspective to jailbreak LLMs as human-like communicators, to explore this
overlooked intersection between everyday language interaction and AI safety.
Specifically, we study how to persuade LLMs to jailbreak them. First, we
propose a persuasion taxonomy derived from decades of social science research.
Then, we apply the taxonomy to automatically generate interpretable persuasive
adversarial prompts (PAP) to jailbreak LLMs. Results show that persuasion
significantly increases the jailbreak performance across all risk categories:
PAP consistently achieves an attack success rate of over on Llama 2-7b
Chat, GPT-3.5, and GPT-4 in trials, surpassing recent algorithm-focused
attacks. On the defense side, we explore various mechanisms against PAP and,
found a significant gap in existing defenses, and advocate for more fundamental
mitigation for highly interactive LLMsComment: 14 pages of the main text, qualitative examples of jailbreaks may be
harmful in natur
Structured Attention for Unsupervised Dialogue Structure Induction
Inducing a meaningful structural representation from one or a set of
dialogues is a crucial but challenging task in computational linguistics.
Advancement made in this area is critical for dialogue system design and
discourse analysis. It can also be extended to solve grammatical inference. In
this work, we propose to incorporate structured attention layers into a
Variational Recurrent Neural Network (VRNN) model with discrete latent states
to learn dialogue structure in an unsupervised fashion. Compared to a vanilla
VRNN, structured attention enables a model to focus on different parts of the
source sentence embeddings while enforcing a structural inductive bias.
Experiments show that on two-party dialogue datasets, VRNN with structured
attention learns semantic structures that are similar to templates used to
generate this dialogue corpus. While on multi-party dialogue datasets, our
model learns an interactive structure demonstrating its capability of
distinguishing speakers or addresses, automatically disentangling dialogues
without explicit human annotation.Comment: Long paper accepted by EMNLP 202
- …