53,961 research outputs found
Bias and Fairness in Chatbots: An Overview
Chatbots have been studied for more than half a century. With the rapid
development of natural language processing (NLP) technologies in recent years,
chatbots using large language models (LLMs) have received much attention
nowadays. Compared with traditional ones, modern chatbots are more powerful and
have been used in real-world applications. There are however, bias and fairness
concerns in modern chatbot design. Due to the huge amounts of training data,
extremely large model sizes, and lack of interpretability, bias mitigation and
fairness preservation of modern chatbots are challenging. Thus, a comprehensive
overview on bias and fairness in chatbot systems is given in this paper. The
history of chatbots and their categories are first reviewed. Then, bias sources
and potential harms in applications are analyzed. Considerations in designing
fair and unbiased chatbot systems are examined. Finally, future research
directions are discussed
Astraea: Grammar-based Fairness Testing
Software often produces biased outputs. In particular, machine learning (ML)
based software are known to produce erroneous predictions when processing
discriminatory inputs. Such unfair program behavior can be caused by societal
bias. In the last few years, Amazon, Microsoft and Google have provided
software services that produce unfair outputs, mostly due to societal bias
(e.g. gender or race). In such events, developers are saddled with the task of
conducting fairness testing. Fairness testing is challenging; developers are
tasked with generating discriminatory inputs that reveal and explain biases.
We propose a grammar-based fairness testing approach (called ASTRAEA) which
leverages context-free grammars to generate discriminatory inputs that reveal
fairness violations in software systems. Using probabilistic grammars, ASTRAEA
also provides fault diagnosis by isolating the cause of observed software bias.
ASTRAEA's diagnoses facilitate the improvement of ML fairness.
ASTRAEA was evaluated on 18 software systems that provide three major natural
language processing (NLP) services. In our evaluation, ASTRAEA generated
fairness violations with a rate of ~18%. ASTRAEA generated over 573K
discriminatory test cases and found over 102K fairness violations. Furthermore,
ASTRAEA improves software fairness by ~76%, via model-retraining
Bridging Fairness and Environmental Sustainability in Natural Language Processing
Fairness and environmental impact are important research directions for the
sustainable development of artificial intelligence. However, while each topic
is an active research area in natural language processing (NLP), there is a
surprising lack of research on the interplay between the two fields. This
lacuna is highly problematic, since there is increasing evidence that an
exclusive focus on fairness can actually hinder environmental sustainability,
and vice versa. In this work, we shed light on this crucial intersection in NLP
by (1) investigating the efficiency of current fairness approaches through
surveying example methods for reducing unfair stereotypical bias from the
literature, and (2) evaluating a common technique to reduce energy consumption
(and thus environmental impact) of English NLP models, knowledge distillation
(KD), for its impact on fairness. In this case study, we evaluate the effect of
important KD factors, including layer and dimensionality reduction, with
respect to: (a) performance on the distillation task (natural language
inference and semantic similarity prediction), and (b) multiple measures and
dimensions of stereotypical bias (e.g., gender bias measured via the Word
Embedding Association Test). Our results lead us to clarify current assumptions
regarding the effect of KD on unfair bias: contrary to other findings, we show
that KD can actually decrease model fairness.Comment: Accepted for publication at EMNLP 202
Bias and Fairness in Large Language Models: A Survey
Rapid advancements of large language models (LLMs) have enabled the
processing, understanding, and generation of human-like text, with increasing
integration into systems that touch our social sphere. Despite this success,
these models can learn, perpetuate, and amplify harmful social biases. In this
paper, we present a comprehensive survey of bias evaluation and mitigation
techniques for LLMs. We first consolidate, formalize, and expand notions of
social bias and fairness in natural language processing, defining distinct
facets of harm and introducing several desiderata to operationalize fairness
for LLMs. We then unify the literature by proposing three intuitive taxonomies,
two for bias evaluation, namely metrics and datasets, and one for mitigation.
Our first taxonomy of metrics for bias evaluation disambiguates the
relationship between metrics and evaluation datasets, and organizes metrics by
the different levels at which they operate in a model: embeddings,
probabilities, and generated text. Our second taxonomy of datasets for bias
evaluation categorizes datasets by their structure as counterfactual inputs or
prompts, and identifies the targeted harms and social groups; we also release a
consolidation of publicly-available datasets for improved access. Our third
taxonomy of techniques for bias mitigation classifies methods by their
intervention during pre-processing, in-training, intra-processing, and
post-processing, with granular subcategories that elucidate research trends.
Finally, we identify open problems and challenges for future work. Synthesizing
a wide range of recent research, we aim to provide a clear guide of the
existing literature that empowers researchers and practitioners to better
understand and prevent the propagation of bias in LLMs
HERB: Measuring Hierarchical Regional Bias in Pre-trained Language Models
Fairness has become a trending topic in natural language processing (NLP),
which addresses biases targeting certain social groups such as genders and
religions. However, regional bias in language models (LMs), a long-standing
global discrimination problem, still remains unexplored. This paper bridges the
gap by analysing the regional bias learned by the pre-trained language models
that are broadly used in NLP tasks. In addition to verifying the existence of
regional bias in LMs, we find that the biases on regional groups can be
strongly influenced by the geographical clustering of the groups. We
accordingly propose a HiErarchical Regional Bias evaluation method (HERB)
utilising the information from the sub-region clusters to quantify the bias in
pre-trained LMs. Experiments show that our hierarchical metric can effectively
evaluate the regional bias with respect to comprehensive topics and measure the
potential regional bias that can be propagated to downstream tasks. Our codes
are available at https://github.com/Bernard-Yang/HERB.Comment: Accepted at AACL 2022 as Long Finding
Measuring and Comparing Social Bias in Static and Contextual Word Embeddings
Word embeddings have been considered one of the biggest breakthroughs of deep learning for natural language processing. They are learned numerical vector representations of words where similar words have similar representations. Contextual word embeddings are the promising second-generation of word embeddings assigning a representation to a word based on its context. This can result in different representations for the same word depending on the context (e.g. river bank and commercial bank). There is evidence of social bias (human-like implicit biases based on gender, race, and other social constructs) in word embeddings. While detecting bias in static (classical or non-contextual) word embeddings is a well-researched topic, there has been limited work in detecting bias in contextual word embeddings, mostly focussed on using the Word Embedding Association Test (WEAT). This paper explores measuring social bias (gender, ethnicity, and religion) in contextual word embeddings using a number of fairness metrics, including the Relative Norm Distance (RND), the Relative Negative Sentiment Bias (RNSB) and the already mentioned WEAT. It extends the Word Embeddings Fairness Evaluation (WEFE) framework to facilitate measuring social biases in contextual embeddings and compares these with biases in static word embeddings. The results show when ranking performance over a number of fairness metrics that contextual word embedding pre-trained models BERT and RoBERTa have more social bias than static word embedding pre-trained models GloVe and Word2Vec
- …