49 research outputs found
Prompt Compression and Contrastive Conditioning for Controllability and Toxicity Reduction in Language Models
We explore the idea of compressing the prompts used to condition language
models, and show that compressed prompts can retain a substantive amount of
information about the original prompt. For severely compressed prompts, while
fine-grained information is lost, abstract information and general sentiments
can be retained with surprisingly few parameters, which can be useful in the
context of decode-time algorithms for controllability and toxicity reduction.
We explore contrastive conditioning to steer language model generation towards
desirable text and away from undesirable text, and find that some complex
prompts can be effectively compressed into a single token to guide generation.
We also show that compressed prompts are largely compositional, and can be
constructed such that they can be used to control independent aspects of
generated text.Comment: Empirical Methods in Natural Language Processing, 2022 (Main-Long
Paper
Adding Instructions during Pretraining: Effective Way of Controlling Toxicity in Language Models
Pretrained large language models have become indispensable for solving
various natural language processing (NLP) tasks. However, safely deploying them
in real world applications is challenging because they generate toxic content.
To address this challenge, we propose two novel pretraining data augmentation
strategies that significantly reduce model toxicity without compromising its
utility. Our two strategies are: (1) MEDA: adds raw toxicity score as meta-data
to the pretraining samples, and (2) INST: adds instructions to those samples
indicating their toxicity. Our results indicate that our best performing
strategy (INST) substantially reduces the toxicity probability up to 61% while
preserving the accuracy on five benchmark NLP tasks as well as improving AUC
scores on four bias detection tasks by 1.3%. We also demonstrate the
generalizability of our techniques by scaling the number of training samples
and the number of model parameters.Comment: This paper will be presented at EACL 202