We present the Assignment-Maximization Spectral Attribute removaL (AMSAL)
algorithm, which erases information from neural representations when the
information to be erased is implicit rather than directly being aligned to each
input example. Our algorithm works by alternating between two steps. In one, it
finds an assignment of the input representations to the information to be
erased, and in the other, it creates projections of both the input
representations and the information to be erased into a joint latent space. We
test our algorithm on an extensive array of datasets, including a Twitter
dataset with multiple guarded attributes, the BiasBios dataset and the
BiasBench benchmark. The last benchmark includes four datasets with various
types of protected attributes. Our results demonstrate that bias can often be
removed in our setup. We also discuss the limitations of our approach when
there is a strong entanglement between the main task and the information to be
erased.Comment: Accepted to Transactions of the Association for Computational
Linguistics, 22 pages (pre-MIT Press publication version