1,380 research outputs found
Toward Gender-Inclusive Coreference Resolution
Correctly resolving textual mentions of people fundamentally entails making
inferences about those people. Such inferences raise the risk of systemic
biases in coreference resolution systems, including biases that can harm binary
and non-binary trans and cis stakeholders. To better understand such biases, we
foreground nuanced conceptualizations of gender from sociology and
sociolinguistics, and develop two new datasets for interrogating bias in crowd
annotations and in existing coreference resolution systems. Through these
studies, conducted on English text, we confirm that without acknowledging and
building systems that recognize the complexity of gender, we build systems that
lead to many potential harms.Comment: 28 pages; ACL versio
A Causal Inference Method for Reducing Gender Bias in Word Embedding Relations
Word embedding has become essential for natural language processing as it
boosts empirical performances of various tasks. However, recent research
discovers that gender bias is incorporated in neural word embeddings, and
downstream tasks that rely on these biased word vectors also produce
gender-biased results. While some word-embedding gender-debiasing methods have
been developed, these methods mainly focus on reducing gender bias associated
with gender direction and fail to reduce the gender bias presented in word
embedding relations. In this paper, we design a causal and simple approach for
mitigating gender bias in word vector relation by utilizing the statistical
dependency between gender-definition word embeddings and gender-biased word
embeddings. Our method attains state-of-the-art results on gender-debiasing
tasks, lexical- and sentence-level evaluation tasks, and downstream coreference
resolution tasks.Comment: Accepted by AAAI 202
The Gap on GAP: Tackling the Problem of Differing Data Distributions in Bias-Measuring Datasets
Diagnostic datasets that can detect biased models are an important
prerequisite for bias reduction within natural language processing. However,
undesired patterns in the collected data can make such tests incorrect. For
example, if the feminine subset of a gender-bias-measuring coreference
resolution dataset contains sentences with a longer average distance between
the pronoun and the correct candidate, an RNN-based model may perform worse on
this subset due to long-term dependencies. In this work, we introduce a
theoretically grounded method for weighting test samples to cope with such
patterns in the test data. We demonstrate the method on the GAP dataset for
coreference resolution. We annotate GAP with spans of all personal names and
show that examples in the female subset contain more personal names and a
longer distance between pronouns and their referents, potentially affecting the
bias score in an undesired way. Using our weighting method, we find the set of
weights on the test instances that should be used for coping with these
correlations, and we re-evaluate 16 recently released coreference models.Comment: Accepted to AAAI 2021 conference and AFCI workshop at NeurIPS 2020
conferenc
- …