18,390 research outputs found
Right for the Right Reason: Training Agnostic Networks
We consider the problem of a neural network being requested to classify
images (or other inputs) without making implicit use of a "protected concept",
that is a concept that should not play any role in the decision of the network.
Typically these concepts include information such as gender or race, or other
contextual information such as image backgrounds that might be implicitly
reflected in unknown correlations with other variables, making it insufficient
to simply remove them from the input features. In other words, making accurate
predictions is not good enough if those predictions rely on information that
should not be used: predictive performance is not the only important metric for
learning systems. We apply a method developed in the context of domain
adaptation to address this problem of "being right for the right reason", where
we request a classifier to make a decision in a way that is entirely 'agnostic'
to a given protected concept (e.g. gender, race, background etc.), even if this
could be implicitly reflected in other attributes via unknown correlations.
After defining the concept of an 'agnostic model', we demonstrate how the
Domain-Adversarial Neural Network can remove unwanted information from a model
using a gradient reversal layer.Comment: Author's original versio
- …