1 research outputs found
Diagnosing and Debiasing Corpus-Based Political Bias and Insults in GPT2
The training of large language models (LLMs) on extensive, unfiltered corpora
sourced from the internet is a common and advantageous practice. Consequently,
LLMs have learned and inadvertently reproduced various types of biases,
including violent, offensive, and toxic language. However, recent research
shows that generative pretrained transformer (GPT) language models can
recognize their own biases and detect toxicity in generated content, a process
referred to as self-diagnosis. In response, researchers have developed a
decoding algorithm that allows LLMs to self-debias, or reduce their likelihood
of generating harmful text. This study investigates the efficacy of the
diagnosing-debiasing approach in mitigating two additional types of biases:
insults and political bias. These biases are often used interchangeably in
discourse, despite exhibiting potentially dissimilar semantic and syntactic
properties. We aim to contribute to the ongoing effort of investigating the
ethical and social implications of human-AI interaction.Comment: 9 page