446 research outputs found
Quantifying and Reducing Stereotypes in Word Embeddings
Machine learning algorithms are optimized to model statistical properties of
the training data. If the input data reflects stereotypes and biases of the
broader society, then the output of the learning algorithm also captures these
stereotypes. In this paper, we initiate the study of gender stereotypes in {\em
word embedding}, a popular framework to represent text data. As their use
becomes increasingly common, applications can inadvertently amplify unwanted
stereotypes. We show across multiple datasets that the embeddings contain
significant gender stereotypes, especially with regard to professions. We
created a novel gender analogy task and combined it with crowdsourcing to
systematically quantify the gender bias in a given embedding. We developed an
efficient algorithm that reduces gender stereotype using just a handful of
training examples while preserving the useful geometric properties of the
embedding. We evaluated our algorithm on several metrics. While we focus on
male/female stereotypes, our framework may be applicable to other types of
embedding biases.Comment: presented at 2016 ICML Workshop on #Data4Good: Machine Learning in
Social Good Applications, New York, N
Quantifying and reducing stereotypes in word embeddings
Machine learning algorithms are optimized to
model statistical properties of the training data.
If the input data reflects stereotypes and biases
of the broader society, then the output of the
learning algorithm also captures these stereotypes.
In this paper, we initiate the study of
gender stereotypes in word embedding, a popular
framework to represent text data. As their
use becomes increasingly common, applications
can inadvertently amplify unwanted stereotypes.
We show across multiple datasets that the embeddings
contain significant gender stereotypes,
especially with regard to professions. We created
a novel gender analogy task and combined it
with crowdsourcing to systematically quantify
the gender bias in a given embedding. We
developed an efficient algorithm that reduces
gender stereotype using just a handful of training
examples while preserving the useful geometric
properties of the embedding. We evaluated our
algorithm on several metrics. While we focus on
male/female stereotypes, our framework may be
applicable to other types of embedding biases.Published versio
Language (Technology) is Power: A Critical Survey of "Bias" in NLP
We survey 146 papers analyzing "bias" in NLP systems, finding that their
motivations are often vague, inconsistent, and lacking in normative reasoning,
despite the fact that analyzing "bias" is an inherently normative process. We
further find that these papers' proposed quantitative techniques for measuring
or mitigating "bias" are poorly matched to their motivations and do not engage
with the relevant literature outside of NLP. Based on these findings, we
describe the beginnings of a path forward by proposing three recommendations
that should guide work analyzing "bias" in NLP systems. These recommendations
rest on a greater recognition of the relationships between language and social
hierarchies, encouraging researchers and practitioners to articulate their
conceptualizations of "bias"---i.e., what kinds of system behaviors are
harmful, in what ways, to whom, and why, as well as the normative reasoning
underlying these statements---and to center work around the lived experiences
of members of communities affected by NLP systems, while interrogating and
reimagining the power relations between technologists and such communities
The Cinderella Complex: Word Embeddings Reveal Gender Stereotypes in Movies and Books
Our analysis of thousands of movies and books reveals how these cultural
products weave stereotypical gender roles into morality tales and perpetuate
gender inequality through storytelling. Using the word embedding techniques, we
reveal the constructed emotional dependency of female characters on male
characters in stories
- …