Social Bias Probing: Fairness Benchmarking for Language Models

Augenstein, Isabelle; Guidotti, Riccardo; Manerba, Marta Marchiori; Stańczak, Karolina

Social Bias Probing: Fairness Benchmarking for Language Models

Authors: Isabelle Augenstein
Riccardo Guidotti
Marta Marchiori Manerba
Karolina Stańczak
Publication date: 15 November 2023
Publisher

Abstract

Large language models have been shown to encode a variety of social biases, which carries the risk of downstream harms. While the impact of these biases has been recognized, prior methods for bias evaluation have been limited to binary association tests on small datasets, offering a constrained view of the nature of societal biases within language models. In this paper, we propose an original framework for probing language models for societal biases. We collect a probing dataset to analyze language models' general associations, as well as along the axes of societal categories, identities, and stereotypes. To this end, we leverage a novel perplexity-based fairness score. We curate a large-scale benchmarking dataset addressing drawbacks and limitations of existing fairness collections, expanding to a variety of different identities and stereotypes. When comparing our methodology with prior work, we demonstrate that biases within language models are more nuanced than previously acknowledged. In agreement with recent findings, we find that larger model variants exhibit a higher degree of bias. Moreover, we expose how identities expressing different religions lead to the most pronounced disparate treatments across all models

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2311.09090

Last time updated on 10/02/2024