9 research outputs found

    Detecting Discrimination Risk in Automated Decision-Making Systems with Balance Measures on Input Data

    Get PDF
    Bias in the data used to train decision-making systems is a relevant socio-technical issue that emerged in recent years, and it still lacks a commonly accepted solution. Indeed, the "bias in-bias out" problem represents one of the most significant risks of discrimination, which encompasses technical fields, as well as ethical and social perspectives. We contribute to the current studies of the issue by proposing a data quality measurement approach combined with risk management, both defined in ISO/IEC standards. For this purpose, we investigate imbalance in a given dataset as a potential risk factor for detecting discrimination in the classification outcome: specifically, we aim to evaluate whether it is possible to identify the risk of bias in a classification output by measuring the level of (im)balance in the input data. We select four balance measures (the Gini, Shannon, Simpson, and Imbalance ratio indexes) and we test their capability to identify discriminatory classification outputs by applying such measures to protected attributes in the training set. The results of this analysis show that the proposed approach is suitable for the goal highlighted above: the balance measures properly detect unfairness of software output, even though the choice of the index has a relevant impact on the detection of discriminatory outcomes, therefore further work is required to test more in-depth the reliability of the balance measures as risk indicators. We believe that our approach for assessing the risk of discrimination should encourage to take more conscious and appropriate actions, as well as to prevent adverse effects caused by the "bias in-bias out" problem

    Appendix for "Identifying Imbalance Thresholds in Input Data to Achieve Desired Levels of Algorithmic Fairness"

    Get PDF
    In this document we provide three appendixes for the journal article “Identifying Imbalance Thresholds in Input Data to Achieve Desired Levels of Algorithmic Fairness”. In Appendix A we show predictors and targets that we took into account for each dataset employed in our study. In Appendix B we describe the configurations of the thresholds that we defined during the procedure of Identification of Risk Thresholds. In Appendix C, for each combination of balance-unfairness-algorithm we report the best thresholds selected by accuracy, the configuration they correspond to (among the 5 options described in Appendix B), and all the evaluation metrics related to those thresholds

    Measuring Imbalance on Intersectional Protected Attributes and on Target Variable to Forecast Unfair Classifications

    Get PDF
    Bias in software systems is a serious threat to human rights: when software makes decisions that allocate resources or opportunities, may disparately impact people based on personal traits (e.g., gender, ethnic group, etc.), systematically (dis)advantaging certain social groups. The cause is very often the imbalance of training data, that is, unequal distribution of data between the classes of an attribute. Previous studies showed that lower levels of balance in protected attributes are related to higher levels of unfairness in the output. In this paper we contribute to the current status of knowledge on balance measures as risk indicators of systematic discriminations by studying imbalance on two further aspects: the intersectionality among the classes of protected attributes, and the combination of the target variable with protected attributes. We conduct an empirical study to verify whether: i) it is possible to infer the balance of intersectional attributes from the balance of the primary attributes, ii) measures of balance on intersectional attributes are helpful to detect unfairness in the classification outcome, iii) the computation of balance on the combination of a target variable with protected attributes improves the detection of unfairness. Overall the results reveal positive answers, but not for every combination of balance measure and fairness criterion. For this reason, we recommend selecting the fairness and balance measures that are most suitable to the application context when applying our risk approach to real cases

    Reproducibility Package for 'A Data Quality Approach to the Identification of Discrimination Risk in Automated Decision Making Systems'

    No full text
    Automated decision-making (ADM) systems may affect multiple aspects of our lives. In particular, they can result in systematic discrimination of specific population groups, in violation of the EU Charter of Fundamental Rights. One of the potential causes of discriminative behavior, i.e. unfairness, lies in the quality of the data used to train such ADM systems. Using a data quality measurement approach combined with risk management, both defined in ISO standards, we focus on balance characteristics and we aim to understand how balance indexes (Gini, Simpson, Shannon, Imbalance ratio) identify discrimination risk in six large datasets containing the classification output of ADM systems. The best result is achieved using the Imbalance Ratio index. Gini and Shannon indexes tend to assume high values and for this reason they have modest results in both aspects: further experimentation with different thresholds is needed. In terms of policies, the risk-based approach is a core element of the EU approach to regulate algorithmic systems: in this context, balance measures can be easily assumed as risk indicators of propagation -- or even amplification -- of bias in the input data of ADM systems

    A data quality approach to the identification of discrimination risk in automated decision making systems

    No full text
    Automated decision-making (ADM) systems may affect multiple aspects of our lives. In particular, they can result in systematic discrimination of specific population groups, in violation of the EU Charter of Fundamental Rights. One of the potential causes of discriminative behavior, i.e., unfairness, lies in the quality of the data used to train such ADM systems. Using a data quality measurement approach combined with risk management, both defined in ISO standards, we focus on balance characteristics and we aim to understand how balance indexes (Gini, Simpson, Shannon, Imbalance Ratio) identify discrimination risk in six large datasets containing the classification output of ADM systems. The best result is achieved using the Imbalance Ratio index. Gini and Shannon indexes tend to assume high values and for this reason they have modest results in both aspects: further experimentation with different thresholds is needed. In terms of policies, the risk-based approach is a core element of the EU approach to regulate algorithmic systems: in this context, balance measures can be easily assumed as risk indicators of propagation – or even amplification – of bias in the input data of ADM systems

    Reproducibility Package for 'Identifying risks in datasets for automated decision–making'

    No full text
    Our daily life is profoundly affected by the adoption of automated decision making (ADM) systems due to the ongoing tendency of humans to delegate machines to take decisions. The unleashed usage of ADM systems was facilitated by the availability of large-scale data, alongside with the deployment of devices and equipment. This trend resulted in an increasing influence of ADM systems' output over several aspects of our life, with possible discriminatory consequences towards certain individuals or groups. In this context, we focus on input data by investigating measurable characteristics which can lead to discriminating automated decisions. In particular, we identified two indexes of heterogeneity and diversity, and tested them on two datasets. A limitation we found is the index sensitivity to a large number of categories, but on the whole results show that the indexes reflect well imbalances in the input data. Future work is required to further assess the reliability of these indexes as indicators of discrimination risks in the context of ADM, in order to foster a more conscious and responsible use of ADM systems through an immediate investigation on input data

    Quantitative methods in ocular fundus imaging: Analysis of retinal microvasculature

    No full text
    Several diseases including diabetes, hypertension and glaucoma are known to cause alterations in the human retina that can be visualized non-invasively and in vivo using well established techniques of fundus photography. Since the treatment of these diseases can be significantly improved with early detection, methods for the quantitative analysis of fundus imaging have been the subject of extensive studies. Following major advances in image processing and machine learning during the last decade, a remarkable progress is being made towards developing automated quantitative methods to identify image-based bio-markers of different pathologies. In this paper, we focus especially on the automated analysis of alterations of retinal microvasculature - a class of structural alterations that is particularly important for early detection of cardiovascular and neurological diseases
    corecore