Look and You Will Find It:Fairness-Aware Data Collection through Active Learning

Abstract

Machine learning models are often trained on data sets subject to selection bias. In particular, selection bias can be hard to avoid in scenarios where the proportion of positives is low and labeling is expensive, such as fraud detection. However, when selection bias is related to sensitive characteristics such as gender and race, it can result in an unequal distribution of burdens across sensitive groups, where marginalized groups are misrepresented and disproportionately scrutinized. Moreover, when the predictions of existing systems affect the selection of new labels, a feedback loop can occur in which selection bias is amplified over time. In this work, we explore the effectiveness of active learning approaches to mitigate fairnessrelated harm caused by selection bias. Active learning approaches aim to select the most informative instances from unlabeled data. We hypothesize that this characteristic steers data collection towards underexplored areas of the feature space and away from overexplored areas – including areas affectedby selection bias. Our preliminary simulation results confirm the intuition that active learning can mitigate the negative consequences of selection bias, compared to both the baseline scenario and random sampling.<br/

    Similar works