Search CORE

12 research outputs found

AXOLOTL: Fairness through Assisted Self-Debiasing of Large Language Model Outputs

Author: Asudeh Abolfazl
Chen Kaiwen
Das Gautam
Ebrahimi Sana
Koudas Nick
Publication venue
Publication date: 29/02/2024
Field of study

Pre-trained Large Language Models (LLMs) have significantly advanced natural language processing capabilities but are susceptible to biases present in their training data, leading to unfair outcomes in various applications. While numerous strategies have been proposed to mitigate bias, they often require extensive computational resources and may compromise model performance. In this work, we introduce AXOLOTL, a novel post-processing framework, which operates agnostically across tasks and models, leveraging public APIs to interact with LLMs without direct access to internal parameters. Through a three-step process resembling zero-shot learning, AXOLOTL identifies biases, proposes resolutions, and guides the model to self-debias its outputs. This approach minimizes computational costs and preserves model performance, making AXOLOTL a promising tool for debiasing LLM outputs with broad applicability and ease of use

arXiv.org e-Print Archive

Self-Debiasing Large Language Models: Zero-Shot Recognition and Reduction of Stereotypes

Author: Barrow Joe
Deilamsalehy Hanieh
Dernoncourt Franck
Gallegos Isabel O.
Kim Sungchul
Rossi Ryan A.
Tanjim Md Mehrab
Yu Tong
Zhang Ruiyi
Publication venue
Publication date: 02/02/2024
Field of study

Large language models (LLMs) have shown remarkable advances in language generation and understanding but are also prone to exhibiting harmful social biases. While recognition of these behaviors has generated an abundance of bias mitigation techniques, most require modifications to the training data, model parameters, or decoding strategy, which may be infeasible without access to a trainable model. In this work, we leverage the zero-shot capabilities of LLMs to reduce stereotyping in a technique we introduce as zero-shot self-debiasing. With two approaches, self-debiasing via explanation and self-debiasing via reprompting, we show that self-debiasing can significantly reduce the degree of stereotyping across nine different social groups while relying only on the LLM itself and a simple prompt, with explanations correctly identifying invalid assumptions and reprompting delivering the greatest reductions in bias. We hope this work opens inquiry into other zero-shot techniques for bias mitigation

arXiv.org e-Print Archive

Fortifying Ethical Boundaries in AI: Advanced Strategies for Enhancing Security in Large Language Models

Author: He Yunhong
Qiu Jianling
Yuan Zhengqing
Zhang Wei
Publication venue
Publication date: 27/01/2024
Field of study

Recent advancements in large language models (LLMs) have significantly enhanced capabilities in natural language processing and artificial intelligence. These models, including GPT-3.5 and LLaMA-2, have revolutionized text generation, translation, and question-answering tasks due to the transformative Transformer model. Despite their widespread use, LLMs present challenges such as ethical dilemmas when models are compelled to respond inappropriately, susceptibility to phishing attacks, and privacy violations. This paper addresses these challenges by introducing a multi-pronged approach that includes: 1) filtering sensitive vocabulary from user input to prevent unethical responses; 2) detecting role-playing to halt interactions that could lead to 'prison break' scenarios; 3) implementing custom rule engines to restrict the generation of prohibited content; and 4) extending these methodologies to various LLM derivatives like Multi-Model Large Language Models (MLLMs). Our approach not only fortifies models against unethical manipulations and privacy breaches but also maintains their high performance across tasks. We demonstrate state-of-the-art performance under various attack prompts, without compromising the model's core functionalities. Furthermore, the introduction of differentiated security levels empowers users to control their personal data disclosure. Our methods contribute to reducing social risks and conflicts arising from technological abuse, enhance data protection, and promote social equity. Collectively, this research provides a framework for balancing the efficiency of question-answering systems with user privacy and ethical standards, ensuring a safer user experience and fostering trust in AI technology

arXiv.org e-Print Archive

Mitigating Bias for Question Answering Models by Tracking Bias Influence

Author: Chang Kai-Wei
Chung Tagyoung
Gupta Arpit
Kao Jiun-Yu
Lin Yu-Hsiang
Ma Mingyu Derek
Peng Nanyun
Wang Wei
Zhao Wenbo
Publication venue
Publication date: 12/10/2023
Field of study

Models of various NLP tasks have been shown to exhibit stereotypes, and the bias in the question answering (QA) models is especially harmful as the output answers might be directly consumed by the end users. There have been datasets to evaluate bias in QA models, while bias mitigation technique for the QA models is still under-explored. In this work, we propose BMBI, an approach to mitigate the bias of multiple-choice QA models. Based on the intuition that a model would lean to be more biased if it learns from a biased example, we measure the bias level of a query instance by observing its influence on another instance. If the influenced instance is more biased, we derive that the query instance is biased. We then use the bias level detected as an optimization objective to form a multi-task learning setting in addition to the original QA task. We further introduce a new bias evaluation metric to quantify bias in a comprehensive and sensitive way. We show that our method could be applied to multiple QA formulations across multiple bias categories. It can significantly reduce the bias level in all 9 bias categories in the BBQ dataset while maintaining comparable QA accuracy

arXiv.org e-Print Archive

Survey on Sociodemographic Bias in Natural Language Processing

Author: Gupta Vipul
Passonneau Rebecca J.
Venkit Pranav Narayanan
Wilson Shomir
Publication venue
Publication date: 26/06/2023
Field of study

Deep neural networks often learn unintended biases during training, which might have harmful effects when deployed in real-world settings. This paper surveys 209 papers on bias in NLP models, most of which address sociodemographic bias. To better understand the distinction between bias and real-world harm, we turn to ideas from psychology and behavioral economics to propose a definition for sociodemographic bias. We identify three main categories of NLP bias research: types of bias, quantifying bias, and debiasing. We conclude that current approaches on quantifying bias face reliability issues, that many of the bias metrics do not relate to real-world biases, and that current debiasing techniques are superficial and hide bias rather than removing it. Finally, we provide recommendations for future work.Comment: 23 pages, 1 figur

arXiv.org e-Print Archive

A Privacy-Preserving Dialogue System Based on Argumentation

Author: Fazzinga Bettina
Galassi Andrea
Torroni Paolo
Publication venue
Publication date: 01/01/2022
Field of study

Dialogue systems are a class of increasingly popular AI-based solutions to support timely and interactive communication with users in many domains. Due to the apparent possibility of users disclosing their sensitive data when interacting with such systems, ensuring that the systems follow the relevant laws, regulations, and ethical principles should be of primary concern. In this context, we discuss the main open points regarding these aspects and propose an approach grounded on a computational argumentation framework. Our approach ensures that user data are managed according to data minimization, purpose limitation, and integrity. Moreover, it is endowed with the capability of providing motivations for the system responses to offer transparency and explainability. We illustrate the architecture using as a case study a COVID-19 vaccine information system, discuss its theoretical properties, and evaluate it empirically

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna