Search CORE

49 research outputs found

Prompt Compression and Contrastive Conditioning for Controllability and Toxicity Reduction in Language Models

Author: Shoeybi Mohammad
Sorensen Taylor
Wingate David
Publication venue
Publication date: 06/10/2022
Field of study

We explore the idea of compressing the prompts used to condition language models, and show that compressed prompts can retain a substantive amount of information about the original prompt. For severely compressed prompts, while fine-grained information is lost, abstract information and general sentiments can be retained with surprisingly few parameters, which can be useful in the context of decode-time algorithms for controllability and toxicity reduction. We explore contrastive conditioning to steer language model generation towards desirable text and away from undesirable text, and find that some complex prompts can be effectively compressed into a single token to guide generation. We also show that compressed prompts are largely compositional, and can be constructed such that they can be used to control independent aspects of generated text.Comment: Empirical Methods in Natural Language Processing, 2022 (Main-Long Paper

arXiv.org e-Print Archive

Adding Instructions during Pretraining: Effective Way of Controlling Toxicity in Language Models

Author: Catanzaro Bryan
Patwary Mostofa
Prabhumoye Shrimai
Shoeybi Mohammad
Publication venue
Publication date: 14/02/2023
Field of study

Pretrained large language models have become indispensable for solving various natural language processing (NLP) tasks. However, safely deploying them in real world applications is challenging because they generate toxic content. To address this challenge, we propose two novel pretraining data augmentation strategies that significantly reduce model toxicity without compromising its utility. Our two strategies are: (1) MEDA: adds raw toxicity score as meta-data to the pretraining samples, and (2) INST: adds instructions to those samples indicating their toxicity. Our results indicate that our best performing strategy (INST) substantially reduces the toxicity probability up to 61% while preserving the accuracy on five benchmark NLP tasks as well as improving AUC scores on four bias detection tasks by 1.3%. We also demonstrate the generalizability of our techniques by scaling the number of training samples and the number of model parameters.Comment: This paper will be presented at EACL 202

arXiv.org e-Print Archive