3,465 research outputs found
Modular and On-demand Bias Mitigation with Attribute-Removal Subnetworks
Societal biases are reflected in large pre-trained language models and their
fine-tuned versions on downstream tasks. Common in-processing bias mitigation
approaches, such as adversarial training and mutual information removal,
introduce additional optimization criteria, and update the model to reach a new
debiased state. However, in practice, end-users and practitioners might prefer
to switch back to the original model, or apply debiasing only on a specific
subset of protected attributes. To enable this, we propose a novel modular bias
mitigation approach, consisting of stand-alone highly sparse debiasing
subnetworks, where each debiasing module can be integrated into the core model
on-demand at inference time. Our approach draws from the concept of \emph{diff}
pruning, and proposes a novel training regime adaptable to various
representation disentanglement optimizations. We conduct experiments on three
classification tasks with gender, race, and age as protected attributes. The
results show that our modular approach, while maintaining task performance,
improves (or at least remains on-par with) the effectiveness of bias mitigation
in comparison with baseline finetuning. Particularly on a two-attribute
dataset, our approach with separately learned debiasing subnetworks shows
effective utilization of either or both the subnetworks for selective bias
mitigation.Comment: Accepted in Findings of ACL 202
Sustaining Fairness via Incremental Learning
Machine learning systems are often deployed for making critical decisions
like credit lending, hiring, etc. While making decisions, such systems often
encode the user's demographic information (like gender, age) in their
intermediate representations. This can lead to decisions that are biased
towards specific demographics. Prior work has focused on debiasing intermediate
representations to ensure fair decisions. However, these approaches fail to
remain fair with changes in the task or demographic distribution. To ensure
fairness in the wild, it is important for a system to adapt to such changes as
it accesses new data in an incremental fashion. In this work, we propose to
address this issue by introducing the problem of learning fair representations
in an incremental learning setting. To this end, we present Fairness-aware
Incremental Representation Learning (FaIRL), a representation learning system
that can sustain fairness while incrementally learning new tasks. FaIRL is able
to achieve fairness and learn new tasks by controlling the rate-distortion
function of the learned representations. Our empirical evaluations show that
FaIRL is able to make fair decisions while achieving high performance on the
target task, outperforming several baselines.Comment: Accepted at AAAI 202
Robust Concept Erasure via Kernelized Rate-Distortion Maximization
Distributed representations provide a vector space that captures meaningful
relationships between data instances. The distributed nature of these
representations, however, entangles together multiple attributes or concepts of
data instances (e.g., the topic or sentiment of a text, characteristics of the
author (age, gender, etc), etc). Recent work has proposed the task of concept
erasure, in which rather than making a concept predictable, the goal is to
remove an attribute from distributed representations while retaining other
information from the original representation space as much as possible. In this
paper, we propose a new distance metric learning-based objective, the
Kernelized Rate-Distortion Maximizer (KRaM), for performing concept erasure.
KRaM fits a transformation of representations to match a specified distance
measure (defined by a labeled concept to erase) using a modified
rate-distortion function. Specifically, KRaM's objective function aims to make
instances with similar concept labels dissimilar in the learned representation
space while retaining other information. We find that optimizing KRaM
effectively erases various types of concepts: categorical, continuous, and
vector-valued variables from data representations across diverse domains. We
also provide a theoretical analysis of several properties of KRaM's objective.
To assess the quality of the learned representations, we propose an alignment
score to evaluate their similarity with the original representation space.
Additionally, we conduct experiments to showcase KRaM's efficacy in various
settings, from erasing binary gender variables in word embeddings to
vector-valued variables in GPT-3 representations.Comment: NeurIPS 202
A Novel Information-Theoretic Objective to Disentangle Representations for Fair Classification
One of the pursued objectives of deep learning is to provide tools that learn
abstract representations of reality from the observation of multiple contextual
situations. More precisely, one wishes to extract disentangled representations
which are (i) low dimensional and (ii) whose components are independent and
correspond to concepts capturing the essence of the objects under consideration
(Locatello et al., 2019b). One step towards this ambitious project consists in
learning disentangled representations with respect to a predefined (sensitive)
attribute, e.g., the gender or age of the writer. Perhaps one of the main
application for such disentangled representations is fair classification.
Existing methods extract the last layer of a neural network trained with a loss
that is composed of a cross-entropy objective and a disentanglement
regularizer. In this work, we adopt an information-theoretic view of this
problem which motivates a novel family of regularizers that minimizes the
mutual information between the latent representation and the sensitive
attribute conditional to the target. The resulting set of losses, called
CLINIC, is parameter free and thus, it is easier and faster to train. CLINIC
losses are studied through extensive numerical experiments by training over 2k
neural networks. We demonstrate that our methods offer a better
disentanglement/accuracy trade-off than previous techniques, and generalize
better than training with cross-entropy loss solely provided that the
disentanglement task is not too constraining.Comment: Findings AACL 202
A Survey on Out-of-Distribution Evaluation of Neural NLP Models
Adversarial robustness, domain generalization and dataset biases are three
active lines of research contributing to out-of-distribution (OOD) evaluation
on neural NLP models. However, a comprehensive, integrated discussion of the
three research lines is still lacking in the literature. In this survey, we 1)
compare the three lines of research under a unifying definition; 2) summarize
the data-generating processes and evaluation protocols for each line of
research; and 3) emphasize the challenges and opportunities for future work
Mitigating the Position Bias of Transformer Models in Passage Re-Ranking
Supervised machine learning models and their evaluation strongly depends on the quality of the underlying dataset. When we search for a relevant piece of information it may appear anywhere in a given passage. However, we observe a bias in the position of the correct answer in the text in two popular Question Answering datasets used for passage re-ranking. The excessive favoring of earlier positions inside passages is an unwanted artefact. This leads to three common Transformer-based re-ranking models to ignore relevant parts in unseen passages. More concerningly, as the evaluation set is taken from the same biased distribution, the models overfitting to that bias overestimate their true effectiveness. In this work we analyze position bias on datasets, the contextualized representations, and their effect on retrieval results. We propose a debiasing method for retrieval datasets. Our results show that a model trained on a position-biased dataset exhibits a significant decrease in re-ranking effectiveness when evaluated on a debiased dataset. We demonstrate that by mitigating the position bias, Transformer-based re-ranking models are equally effective on a biased and debiased dataset, as well as more effective in a transfer-learning setting between two differently biased datasets
- …