63 research outputs found
Quantifying and Reducing Stereotypes in Word Embeddings
Machine learning algorithms are optimized to model statistical properties of
the training data. If the input data reflects stereotypes and biases of the
broader society, then the output of the learning algorithm also captures these
stereotypes. In this paper, we initiate the study of gender stereotypes in {\em
word embedding}, a popular framework to represent text data. As their use
becomes increasingly common, applications can inadvertently amplify unwanted
stereotypes. We show across multiple datasets that the embeddings contain
significant gender stereotypes, especially with regard to professions. We
created a novel gender analogy task and combined it with crowdsourcing to
systematically quantify the gender bias in a given embedding. We developed an
efficient algorithm that reduces gender stereotype using just a handful of
training examples while preserving the useful geometric properties of the
embedding. We evaluated our algorithm on several metrics. While we focus on
male/female stereotypes, our framework may be applicable to other types of
embedding biases.Comment: presented at 2016 ICML Workshop on #Data4Good: Machine Learning in
Social Good Applications, New York, N
Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints
Language is increasingly being used to define rich visual recognition
problems with supporting image collections sourced from the web. Structured
prediction models are used in these tasks to take advantage of correlations
between co-occurring labels and visual input but risk inadvertently encoding
social biases found in web corpora. In this work, we study data and models
associated with multilabel object classification and visual semantic role
labeling. We find that (a) datasets for these tasks contain significant gender
bias and (b) models trained on these datasets further amplify existing bias.
For example, the activity cooking is over 33% more likely to involve females
than males in a training set, and a trained model further amplifies the
disparity to 68% at test time. We propose to inject corpus-level constraints
for calibrating existing structured prediction models and design an algorithm
based on Lagrangian relaxation for collective inference. Our method results in
almost no performance loss for the underlying recognition task but decreases
the magnitude of bias amplification by 47.5% and 40.5% for multilabel
classification and visual semantic role labeling, respectively.Comment: 11 pages, published in EMNLP 201
Algorithmic Discrimination in the U.S. Justice System: A Quantitative Assessment of Racial and Gender Bias Encoded in the Data Analytics Model of the Correctional Offender Management Profiling for Alternative Sanctions (COMPAS)
The fourth-generation risk-need assessment instruments such as Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) have opened the opportunities for the use of big data analytics to assist judicial decision-making across the criminal justice system in U.S. While the COMPAS system becomes increasingly popular in supporting correctional professionals’ judgement on an offender’s risk of committing future crime, little research has been published to investigate the potential systematic bias encoded in the algorithms behind these assessment tools that could possibly work against certain ethnic or gender groups. This paper uses two-sample t-test and ordinary least-square regression model to demonstrate that COMPAS algorithms systemically generates a higher risk score for African-American and male offenders in terms of the risk of failure to appear, risk of recidivism, and risk of violence. Although race was explicitly excluded when the COMPAS algorithms were developed, the results showed that such an analytic model still systematically discriminates against African- American offenders. This paper introduced the importance of examining algorithmic fairness in big data analytic applications and offers the methodology as well as tools to investigate systematic bias encoded in machine leaning algorithms. Additionally, the implications of this paper also suggest that simply removing the protected variable in a big data algorithm could not be sufficient to eliminate the systematic bias that can still affect the protected groups, and that further research is needed for solutions to thoroughly address the algorithmic bias in big data analytics
- …