942 research outputs found
Towards dialect-inclusive recognition in a low-resource language: are balanced corpora the answer?
ASR systems are generally built for the spoken 'standard', and their
performance declines for non-standard dialects/varieties. This is a problem for
a language like Irish, where there is no single spoken standard, but rather
three major dialects: Ulster (Ul), Connacht (Co) and Munster (Mu). As a
diagnostic to quantify the effect of the speaker's dialect on recognition
performance, 12 ASR systems were trained, firstly using baseline
dialect-balanced training corpora, and then using modified versions of the
baseline corpora, where dialect-specific materials were either subtracted or
added. Results indicate that dialect-balanced corpora do not yield a similar
performance across the dialects: the Ul dialect consistently underperforms,
whereas Mu yields lowest WERs. There is a close relationship between Co and Mu
dialects, but one that is not symmetrical. These results will guide future
corpus collection and system building strategies to optimise for cross-dialect
performance equity.Comment: Accepted to Interspeech 2023, Dubli
Directional Pairwise Class Confusion Bias and Its Mitigation
Recent advances in Natural Language Processing have led to powerful and sophisticated models like BERT (Bidirectional Encoder Representations from Transformers) that have bias. These models are mostly trained on text corpora that deviate in important ways from the text encountered by a chatbot in a problem-specific context. While a lot of research in the past has focused on measuring and mitigating bias with respect to protected attributes (stereotyping like gender, race, ethnicity, etc.), there is lack of research in model bias with respect to classification labels. We investigate whether a classification model hugely favors one class with respect to another. We introduce a bias evaluation method called directional pairwise class confusion bias that highlights the chatbot intent classification model’s bias on pairs of classes. Finally, we also present two strategies to mitigate this bias using example biased pairs
Evaluating Bias and Fairness in Gender-Neutral Pretrained Vision-and-Language Models
Pretrained machine learning models are known to perpetuate and even amplify
existing biases in data, which can result in unfair outcomes that ultimately
impact user experience. Therefore, it is crucial to understand the mechanisms
behind those prejudicial biases to ensure that model performance does not
result in discriminatory behaviour toward certain groups or populations. In
this work, we define gender bias as our case study. We quantify bias
amplification in pretraining and after fine-tuning on three families of
vision-and-language models. We investigate the connection, if any, between the
two learning stages, and evaluate how bias amplification reflects on model
performance. Overall, we find that bias amplification in pretraining and after
fine-tuning are independent. We then examine the effect of continued
pretraining on gender-neutral data, finding that this reduces group
disparities, i.e., promotes fairness, on VQAv2 and retrieval tasks without
significantly compromising task performance.Comment: To appear in EMNLP 202
Undesirable biases in NLP: Averting a crisis of measurement
As Natural Language Processing (NLP) technology rapidly develops and spreads
into daily life, it becomes crucial to anticipate how its use could harm
people. However, our ways of assessing the biases of NLP models have not kept
up. While especially the detection of English gender bias in such models has
enjoyed increasing research attention, many of the measures face serious
problems, as it is often unclear what they actually measure and how much they
are subject to measurement error. In this paper, we provide an
interdisciplinary approach to discussing the issue of NLP model bias by
adopting the lens of psychometrics -- a field specialized in the measurement of
concepts like bias that are not directly observable. We pair an introduction of
relevant psychometric concepts with a discussion of how they could be used to
evaluate and improve bias measures. We also argue that adopting psychometric
vocabulary and methodology can make NLP bias research more efficient and
transparent
A Survey on Fairness in Large Language Models
Large language models (LLMs) have shown powerful performance and development
prospect and are widely deployed in the real world. However, LLMs can capture
social biases from unprocessed training data and propagate the biases to
downstream tasks. Unfair LLM systems have undesirable social impacts and
potential harms. In this paper, we provide a comprehensive review of related
research on fairness in LLMs. First, for medium-scale LLMs, we introduce
evaluation metrics and debiasing methods from the perspectives of intrinsic
bias and extrinsic bias, respectively. Then, for large-scale LLMs, we introduce
recent fairness research, including fairness evaluation, reasons for bias, and
debiasing methods. Finally, we discuss and provide insight on the challenges
and future directions for the development of fairness in LLMs.Comment: 12 pages, 2 figures, 101 reference
MISMIS: Desinformación y agresividad en los medios de comunicación social: agregando información y analizando el lenguaje
[EN] The general objectives of the project are to address and monitor misinformation (biased and fake news) and miscommunication (aggressive language and hate speech) in social media, as well as to establish a high quality methodological standard for the whole research community (i) by developing rich annotated datasets, a data repository and online evaluation services; (ii) by proposing suitable evaluation metrics; and (iii) by organizing evaluation campaigns to foster research on the above issues.[ES] Los objetivos generales del proyecto son abordar y monitorizar la
desinformación (noticias sesgadas y falsas) y la mala comunicación (lenguaje agresivo y
mensajes de odio) en los medios de comunicación social, así como establecer un estándar
metodológico de calidad para toda la comunidad investigadora mediante: i) el desarrollo
de datasets anotados, un repositorio de datos y servicios de evaluación online; ii) la
propuesta de métricas de evaluación adecuadas; y iii) la organización de campañas de
evaluación para fomentar la investigación sobre las cuestiones mencionadas.The MISMIS project (PGC2018-096212-B) is funded by the Spanish Ministry of Science, Innovation and Universities.Rosso, P.; Casacuberta Nolla, F.; Gonzalo, J.; Plaza, L.; Carrillo, J.; Amigó, E.; Verdejo, MF.... (2020). MISMIS: Misinformation and Miscommunication in social media: aggregating information and analysing language. Procesamiento del Lenguaje Natural. (65):101-104. https://doi.org/10.26342/2020-65-13S1011046
Survey on Sociodemographic Bias in Natural Language Processing
Deep neural networks often learn unintended biases during training, which
might have harmful effects when deployed in real-world settings. This paper
surveys 209 papers on bias in NLP models, most of which address
sociodemographic bias. To better understand the distinction between bias and
real-world harm, we turn to ideas from psychology and behavioral economics to
propose a definition for sociodemographic bias. We identify three main
categories of NLP bias research: types of bias, quantifying bias, and
debiasing. We conclude that current approaches on quantifying bias face
reliability issues, that many of the bias metrics do not relate to real-world
biases, and that current debiasing techniques are superficial and hide bias
rather than removing it. Finally, we provide recommendations for future work.Comment: 23 pages, 1 figur
Should ChatGPT be Biased? Challenges and Risks of Bias in Large Language Models
As the capabilities of generative language models continue to advance, the
implications of biases ingrained within these models have garnered increasing
attention from researchers, practitioners, and the broader public. This article
investigates the challenges and risks associated with biases in large-scale
language models like ChatGPT. We discuss the origins of biases, stemming from,
among others, the nature of training data, model specifications, algorithmic
constraints, product design, and policy decisions. We explore the ethical
concerns arising from the unintended consequences of biased model outputs. We
further analyze the potential opportunities to mitigate biases, the
inevitability of some biases, and the implications of deploying these models in
various applications, such as virtual assistants, content generation, and
chatbots. Finally, we review the current approaches to identify, quantify, and
mitigate biases in language models, emphasizing the need for a
multi-disciplinary, collaborative effort to develop more equitable,
transparent, and responsible AI systems. This article aims to stimulate a
thoughtful dialogue within the artificial intelligence community, encouraging
researchers and developers to reflect on the role of biases in generative
language models and the ongoing pursuit of ethical AI.Comment: Submitted to Machine Learning with Application
State-of-the-art generalisation research in NLP: a taxonomy and review
The ability to generalise well is one of the primary desiderata of natural
language processing (NLP). Yet, what `good generalisation' entails and how it
should be evaluated is not well understood, nor are there any common standards
to evaluate it. In this paper, we aim to lay the ground-work to improve both of
these issues. We present a taxonomy for characterising and understanding
generalisation research in NLP, we use that taxonomy to present a comprehensive
map of published generalisation studies, and we make recommendations for which
areas might deserve attention in the future. Our taxonomy is based on an
extensive literature review of generalisation research, and contains five axes
along which studies can differ: their main motivation, the type of
generalisation they aim to solve, the type of data shift they consider, the
source by which this data shift is obtained, and the locus of the shift within
the modelling pipeline. We use our taxonomy to classify over 400 previous
papers that test generalisation, for a total of more than 600 individual
experiments. Considering the results of this review, we present an in-depth
analysis of the current state of generalisation research in NLP, and make
recommendations for the future. Along with this paper, we release a webpage
where the results of our review can be dynamically explored, and which we
intend to up-date as new NLP generalisation studies are published. With this
work, we aim to make steps towards making state-of-the-art generalisation
testing the new status quo in NLP.Comment: 35 pages of content + 53 pages of reference
- …