48,343 research outputs found
Discovering fair representations in the data domain
Interpretability and fairness are critical in computer vision and machine learning applications, in particular when dealing with human outcomes, e.g. inviting or not inviting for a job interview based on application materials that may include photographs. One promising direction to achieve fairness is by learning data representations that remove the semantics of protected characteristics, and are therefore able to mitigate unfair outcomes. All available models however learn latent embeddings which comes at the cost of being uninterpretable. We propose to cast this problem as data-to-data translation, i.e. learning a mapping from an input domain to a fair target domain, where a fairness definition is being enforced. Here the data domain can be images, or any tabular data representation. This task would be straightforward if we had fair target data available, but this is not the case. To overcome this, we learn a highly unconstrained mapping by exploiting statistics of residuals -- the difference between input data and its translated version -- and the protected characteristics. When applied to the CelebA dataset of face images with gender attribute as the protected characteristic, our model enforces equality of opportunity by adjusting the eyes and lips regions. Intriguingly, on the same dataset we arrive at similar conclusions when using semantic attribute representations of images for translation. On face images of the recent DiF dataset, with the same gender attribute, our method adjusts nose regions. In the Adult income dataset, also with protected gender attribute, our model achieves equality of opportunity by, among others, obfuscating the wife and husband relationship. Analyzing those systematic changes will allow us to scrutinize the interplay of fairness criterion, chosen protected characteristics, and prediction performance
Right for the Right Reason: Training Agnostic Networks
We consider the problem of a neural network being requested to classify
images (or other inputs) without making implicit use of a "protected concept",
that is a concept that should not play any role in the decision of the network.
Typically these concepts include information such as gender or race, or other
contextual information such as image backgrounds that might be implicitly
reflected in unknown correlations with other variables, making it insufficient
to simply remove them from the input features. In other words, making accurate
predictions is not good enough if those predictions rely on information that
should not be used: predictive performance is not the only important metric for
learning systems. We apply a method developed in the context of domain
adaptation to address this problem of "being right for the right reason", where
we request a classifier to make a decision in a way that is entirely 'agnostic'
to a given protected concept (e.g. gender, race, background etc.), even if this
could be implicitly reflected in other attributes via unknown correlations.
After defining the concept of an 'agnostic model', we demonstrate how the
Domain-Adversarial Neural Network can remove unwanted information from a model
using a gradient reversal layer.Comment: Author's original versio
Ethical Adversaries: Towards Mitigating Unfairness with Adversarial Machine Learning
Machine learning is being integrated into a growing number of critical
systems with far-reaching impacts on society. Unexpected behaviour and unfair
decision processes are coming under increasing scrutiny due to this widespread
use and its theoretical considerations. Individuals, as well as organisations,
notice, test, and criticize unfair results to hold model designers and
deployers accountable. We offer a framework that assists these groups in
mitigating unfair representations stemming from the training datasets. Our
framework relies on two inter-operating adversaries to improve fairness. First,
a model is trained with the goal of preventing the guessing of protected
attributes' values while limiting utility losses. This first step optimizes the
model's parameters for fairness. Second, the framework leverages evasion
attacks from adversarial machine learning to generate new examples that will be
misclassified. These new examples are then used to retrain and improve the
model in the first step. These two steps are iteratively applied until a
significant improvement in fairness is obtained. We evaluated our framework on
well-studied datasets in the fairness literature -- including COMPAS -- where
it can surpass other approaches concerning demographic parity, equality of
opportunity and also the model's utility. We also illustrate our findings on
the subtle difficulties when mitigating unfairness and highlight how our
framework can assist model designers.Comment: 15 pages, 3 figures, 1 tabl
Improving Generalization for Abstract Reasoning Tasks Using Disentangled Feature Representations
In this work we explore the generalization characteristics of unsupervised
representation learning by leveraging disentangled VAE's to learn a useful
latent space on a set of relational reasoning problems derived from Raven
Progressive Matrices. We show that the latent representations, learned by
unsupervised training using the right objective function, significantly
outperform the same architectures trained with purely supervised learning,
especially when it comes to generalization
- …