2 research outputs found
When Fair Classification Meets Noisy Protected Attributes
The operationalization of algorithmic fairness comes with several practical
challenges, not the least of which is the availability or reliability of
protected attributes in datasets. In real-world contexts, practical and legal
impediments may prevent the collection and use of demographic data, making it
difficult to ensure algorithmic fairness. While initial fairness algorithms did
not consider these limitations, recent proposals aim to achieve algorithmic
fairness in classification by incorporating noisiness in protected attributes
or not using protected attributes at all.
To the best of our knowledge, this is the first head-to-head study of fair
classification algorithms to compare attribute-reliant, noise-tolerant and
attribute-blind algorithms along the dual axes of predictivity and fairness. We
evaluated these algorithms via case studies on four real-world datasets and
synthetic perturbations. Our study reveals that attribute-blind and
noise-tolerant fair classifiers can potentially achieve similar level of
performance as attribute-reliant algorithms, even when protected attributes are
noisy. However, implementing them in practice requires careful nuance. Our
study provides insights into the practical implications of using fair
classification algorithms in scenarios where protected attributes are noisy or
partially available.Comment: Accepted at the 6th AAAI/ACM Conference on Artificial Intelligence,
Ethics and Society (AIES) 202
FLEA: Provably Fair Multisource Learning from Unreliable Training Data
Fairness-aware learning aims at constructing classifiers that not only make
accurate predictions, but do not discriminate against specific groups. It is a
fast-growing area of machine learning with far-reaching societal impact.
However, existing fair learning methods are vulnerable to accidental or
malicious artifacts in the training data, which can cause them to unknowingly
produce unfair classifiers. In this work we address the problem of fair
learning from unreliable training data in the robust multisource setting, where
the available training data comes from multiple sources, a fraction of which
might be not representative of the true data distribution. We introduce FLEA, a
filtering-based algorithm that allows the learning system to identify and
suppress those data sources that would have a negative impact on fairness or
accuracy if they were used for training. We show the effectiveness of our
approach by a diverse range of experiments on multiple datasets. Additionally
we prove formally that, given enough data, FLEA protects the learner against
unreliable data as long as the fraction of affected data sources is less than
half