Despite the success of deep-learning models in many tasks, there have been
concerns about such models learning shortcuts, and their lack of robustness to
irrelevant confounders. When it comes to models directly trained on human
faces, a sensitive confounder is that of human identities. Many face-related
tasks should ideally be identity-independent, and perform uniformly across
different individuals (i.e. be fair). One way to measure and enforce such
robustness and performance uniformity is through enforcing it during training,
assuming identity-related information is available at scale. However, due to
privacy concerns and also the cost of collecting such information, this is
often not the case, and most face datasets simply contain input images and
their corresponding task-related labels. Thus, improving identity-related
robustness without the need for such annotations is of great importance. Here,
we explore using face-recognition embedding vectors, as proxies for identities,
to enforce such robustness. We propose to use the structure in the
face-recognition embedding space, to implicitly emphasize rare samples within
each class. We do so by weighting samples according to their conditional
inverse density (CID) in the proxy embedding space. Our experiments suggest
that such a simple sample weighting scheme, not only improves the training
robustness, it often improves the overall performance as a result of such
robustness. We also show that employing such constraints during training
results in models that are significantly less sensitive to different levels of
bias in the dataset