It has been shown that word embeddings derived from large corpora tend to
incorporate biases present in their training data. Various methods for
mitigating these biases have been proposed, but recent work has demonstrated
that these methods hide but fail to truly remove the biases, which can still be
observed in word nearest-neighbor statistics. In this work we propose a
probabilistic view of word embedding bias. We leverage this framework to
present a novel method for mitigating bias which relies on probabilistic
observations to yield a more robust bias mitigation algorithm. We demonstrate
that this method effectively reduces bias according to three separate measures
of bias while maintaining embedding quality across various popular benchmark
semantic tasksComment: 4 pages, 4 figures, Workshop on Human-Centric Machine Learning at
NeurIPS 201