Metrics for Identifying Bias in Datasets

Paoletti, Maria Cristina; Simonetta, Alessandro; Trenta, Andrea; Vetrò, Antonio

Metrics for Identifying Bias in Datasets

Authors: Maria Cristina Paoletti
Alessandro Simonetta
Andrea Trenta
Antonio Vetrò
Publication date: 1 January 2021
Publisher: place:Aachen

Abstract

Nowadays automated decision-making systems are pervasively used and more often, they are used for taking important decisions in sensitive areas such as the granting of a bank overdraft, the susceptibility of an individual to a virus infection, or even the likelihood of repeating a crime. The widespread use of these systems raises a growing ethical concern about the risk of a potential discriminatory impact. In particular, machine-learning systems trained on unbalanced data could rise to systematic discriminations in the real world. One of the most important challenges is to determine metrics capable of detecting when an unbalanced training dataset may lead to discriminatory behaviour of the model built on it. In this paper, we propose an approach based on the notion of data completeness using two different metrics: one based on the combinations of the values of the dataset, which will be our benchmark, and the second using frame theory, widely used among others for quality measures of control systems. It is important to remark that the use of metrics cannot be a substitute for a broader design that must take into account the columns that could lead to the presence of bias in the data. The line of research does not end with these activities but aims to continue the path towards a standardised register of measures

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

oai:iris.polito.it:11583/29617...

Last time updated on 15/07/2022