12 research outputs found
AXOLOTL: Fairness through Assisted Self-Debiasing of Large Language Model Outputs
Pre-trained Large Language Models (LLMs) have significantly advanced natural
language processing capabilities but are susceptible to biases present in their
training data, leading to unfair outcomes in various applications. While
numerous strategies have been proposed to mitigate bias, they often require
extensive computational resources and may compromise model performance. In this
work, we introduce AXOLOTL, a novel post-processing framework, which operates
agnostically across tasks and models, leveraging public APIs to interact with
LLMs without direct access to internal parameters. Through a three-step process
resembling zero-shot learning, AXOLOTL identifies biases, proposes resolutions,
and guides the model to self-debias its outputs. This approach minimizes
computational costs and preserves model performance, making AXOLOTL a promising
tool for debiasing LLM outputs with broad applicability and ease of use
Self-Debiasing Large Language Models: Zero-Shot Recognition and Reduction of Stereotypes
Large language models (LLMs) have shown remarkable advances in language
generation and understanding but are also prone to exhibiting harmful social
biases. While recognition of these behaviors has generated an abundance of bias
mitigation techniques, most require modifications to the training data, model
parameters, or decoding strategy, which may be infeasible without access to a
trainable model. In this work, we leverage the zero-shot capabilities of LLMs
to reduce stereotyping in a technique we introduce as zero-shot self-debiasing.
With two approaches, self-debiasing via explanation and self-debiasing via
reprompting, we show that self-debiasing can significantly reduce the degree of
stereotyping across nine different social groups while relying only on the LLM
itself and a simple prompt, with explanations correctly identifying invalid
assumptions and reprompting delivering the greatest reductions in bias. We hope
this work opens inquiry into other zero-shot techniques for bias mitigation
Fortifying Ethical Boundaries in AI: Advanced Strategies for Enhancing Security in Large Language Models
Recent advancements in large language models (LLMs) have significantly
enhanced capabilities in natural language processing and artificial
intelligence. These models, including GPT-3.5 and LLaMA-2, have revolutionized
text generation, translation, and question-answering tasks due to the
transformative Transformer model. Despite their widespread use, LLMs present
challenges such as ethical dilemmas when models are compelled to respond
inappropriately, susceptibility to phishing attacks, and privacy violations.
This paper addresses these challenges by introducing a multi-pronged approach
that includes: 1) filtering sensitive vocabulary from user input to prevent
unethical responses; 2) detecting role-playing to halt interactions that could
lead to 'prison break' scenarios; 3) implementing custom rule engines to
restrict the generation of prohibited content; and 4) extending these
methodologies to various LLM derivatives like Multi-Model Large Language Models
(MLLMs). Our approach not only fortifies models against unethical manipulations
and privacy breaches but also maintains their high performance across tasks. We
demonstrate state-of-the-art performance under various attack prompts, without
compromising the model's core functionalities. Furthermore, the introduction of
differentiated security levels empowers users to control their personal data
disclosure. Our methods contribute to reducing social risks and conflicts
arising from technological abuse, enhance data protection, and promote social
equity. Collectively, this research provides a framework for balancing the
efficiency of question-answering systems with user privacy and ethical
standards, ensuring a safer user experience and fostering trust in AI
technology
Mitigating Bias for Question Answering Models by Tracking Bias Influence
Models of various NLP tasks have been shown to exhibit stereotypes, and the
bias in the question answering (QA) models is especially harmful as the output
answers might be directly consumed by the end users. There have been datasets
to evaluate bias in QA models, while bias mitigation technique for the QA
models is still under-explored. In this work, we propose BMBI, an approach to
mitigate the bias of multiple-choice QA models. Based on the intuition that a
model would lean to be more biased if it learns from a biased example, we
measure the bias level of a query instance by observing its influence on
another instance. If the influenced instance is more biased, we derive that the
query instance is biased. We then use the bias level detected as an
optimization objective to form a multi-task learning setting in addition to the
original QA task. We further introduce a new bias evaluation metric to quantify
bias in a comprehensive and sensitive way. We show that our method could be
applied to multiple QA formulations across multiple bias categories. It can
significantly reduce the bias level in all 9 bias categories in the BBQ dataset
while maintaining comparable QA accuracy
Survey on Sociodemographic Bias in Natural Language Processing
Deep neural networks often learn unintended biases during training, which
might have harmful effects when deployed in real-world settings. This paper
surveys 209 papers on bias in NLP models, most of which address
sociodemographic bias. To better understand the distinction between bias and
real-world harm, we turn to ideas from psychology and behavioral economics to
propose a definition for sociodemographic bias. We identify three main
categories of NLP bias research: types of bias, quantifying bias, and
debiasing. We conclude that current approaches on quantifying bias face
reliability issues, that many of the bias metrics do not relate to real-world
biases, and that current debiasing techniques are superficial and hide bias
rather than removing it. Finally, we provide recommendations for future work.Comment: 23 pages, 1 figur
A Privacy-Preserving Dialogue System Based on Argumentation
Dialogue systems are a class of increasingly popular AI-based solutions to support timely and interactive communication with users in many domains. Due to the apparent possibility of users disclosing their sensitive data when interacting with such systems, ensuring that the systems follow the relevant laws, regulations, and ethical principles should be of primary concern. In this context, we discuss the main open points regarding these aspects and propose an approach grounded on a computational argumentation framework. Our approach ensures that user data are managed according to data minimization, purpose limitation, and integrity. Moreover, it is endowed with the capability of providing motivations for the system responses to offer transparency and explainability. We illustrate the architecture using as a case study a COVID-19 vaccine information system, discuss its theoretical properties, and evaluate it empirically