34 research outputs found

    The Flawed Foundations of Fair Machine Learning

    Full text link
    The definition and implementation of fairness in automated decisions has been extensively studied by the research community. Yet, there hides fallacious reasoning, misleading assertions, and questionable practices at the foundations of the current fair machine learning paradigm. Those flaws are the result of a failure to understand that the trade-off between statistically accurate outcomes and group similar outcomes exists as independent, external constraint rather than as a subjective manifestation as has been commonly argued. First, we explain that there is only one conception of fairness present in the fair machine learning literature: group similarity of outcomes based on a sensitive attribute where the similarity benefits an underprivileged group. Second, we show that there is, in fact, a trade-off between statistically accurate outcomes and group similar outcomes in any data setting where group disparities exist, and that the trade-off presents an existential threat to the equitable, fair machine learning approach. Third, we introduce a proof-of-concept evaluation to aid researchers and designers in understanding the relationship between statistically accurate outcomes and group similar outcomes. Finally, suggestions for future work aimed at data scientists, legal scholars, and data ethicists that utilize the conceptual and experimental framework described throughout this article are provided.Comment: This article is a preprint submitted to the Minds and Machines Special Issue on the (Un)fairness of AI on May 31st, 202

    Fairness for Cooperative Multi-Agent Learning with Equivariant Policies

    Full text link
    We study fairness through the lens of cooperative multi-agent learning. Our work is motivated by empirical evidence that naive maximization of team reward yields unfair outcomes for individual team members. To address fairness in multi-agent contexts, we introduce team fairness, a group-based fairness measure for multi-agent learning. We then incorporate team fairness into policy optimization -- introducing Fairness through Equivariance (Fair-E), a novel learning strategy that achieves provably fair reward distributions. We then introduce Fairness through Equivariance Regularization (Fair-ER) as a soft-constraint version of Fair-E and show that Fair-ER reaches higher levels of utility than Fair-E and fairer outcomes than policies with no equivariance. Finally, we investigate the fairness-utility trade-off in multi-agent settings.Comment: 15 pages, 4 figure

    A Theoretical Approach to Characterize the Accuracy-Fairness Trade-off Pareto Frontier

    Full text link
    While the accuracy-fairness trade-off has been frequently observed in the literature of fair machine learning, rigorous theoretical analyses have been scarce. To demystify this long-standing challenge, this work seeks to develop a theoretical framework by characterizing the shape of the accuracy-fairness trade-off Pareto frontier (FairFrontier), determined by a set of all optimal Pareto classifiers that no other classifiers can dominate. Specifically, we first demonstrate the existence of the trade-off in real-world scenarios and then propose four potential categories to characterize the important properties of the accuracy-fairness Pareto frontier. For each category, we identify the necessary conditions that lead to corresponding trade-offs. Experimental results on synthetic data suggest insightful findings of the proposed framework: (1) When sensitive attributes can be fully interpreted by non-sensitive attributes, FairFrontier is mostly continuous. (2) Accuracy can suffer a \textit{sharp} decline when over-pursuing fairness. (3) Eliminate the trade-off via a two-step streamlined approach. The proposed research enables an in-depth understanding of the accuracy-fairness trade-off, pushing current fair machine-learning research to a new frontier

    Counterpart Fairness -- Addressing Systematic between-group Differences in Fairness Evaluation

    Full text link
    When using machine learning (ML) to aid decision-making, it is critical to ensure that an algorithmic decision is fair, i.e., it does not discriminate against specific individuals/groups, particularly those from underprivileged populations. Existing group fairness methods require equal group-wise measures, which however fails to consider systematic between-group differences. The confounding factors, which are non-sensitive variables but manifest systematic differences, can significantly affect fairness evaluation. To mitigate this problem, we believe that a fairness measurement should be based on the comparison between counterparts (i.e., individuals who are similar to each other with respect to the task of interest) from different groups, whose group identities cannot be distinguished algorithmically by exploring confounding factors. We have developed a propensity-score-based method for identifying counterparts, which prevents fairness evaluation from comparing "oranges" with "apples". In addition, we propose a counterpart-based statistical fairness index, termed Counterpart-Fairness (CFair), to assess fairness of ML models. Empirical studies on the Medical Information Mart for Intensive Care (MIMIC)-IV database were conducted to validate the effectiveness of CFair. We publish our code at \url{https://github.com/zhengyjo/CFair}.Comment: 18 pages, 5 figures, 5 table

    On The Fairness Impacts of Hardware Selection in Machine Learning

    Full text link
    In the machine learning ecosystem, hardware selection is often regarded as a mere utility, overshadowed by the spotlight on algorithms and data. This oversight is particularly problematic in contexts like ML-as-a-service platforms, where users often lack control over the hardware used for model deployment. How does the choice of hardware impact generalization properties? This paper investigates the influence of hardware on the delicate balance between model performance and fairness. We demonstrate that hardware choices can exacerbate existing disparities, attributing these discrepancies to variations in gradient flows and loss surfaces across different demographic groups. Through both theoretical and empirical analysis, the paper not only identifies the underlying factors but also proposes an effective strategy for mitigating hardware-induced performance imbalances

    Ensuring generalized fairness in batch classification

    Get PDF
    In this paper, we consider the problem of batch classification and propose a novel framework for achieving fairness in such settings. The problem of batch classification involves selection of a set of individuals, often encountered in real-world scenarios such as job recruitment, college admissions etc. This is in contrast to a typical classification problem, where each candidate in the test set is considered separately and independently. In such scenarios, achieving the same acceptance rate (i.e., probability of the classifier assigning positive class) for each group (membership determined by the value of sensitive attributes such as gender, race etc.) is often not desirable, and the regulatory body specifies a different acceptance rate for each group. The existing fairness enhancing methods do not allow for such specifications and hence are unsuited for such scenarios. In this paper, we define a configuration model whereby the acceptance rate of each group can be regulated and further introduce a novel batch-wise fairness post-processing framework using the classifier confidence-scores. We deploy our framework across four real-world datasets and two popular notions of fairness, namely demographic parity and equalized odds. In addition to consistent performance improvements over the competing baselines, the proposed framework allows flexibility and significant speed-up. It can also seamlessly incorporate multiple overlapping sensitive attributes. To further demonstrate the generalizability of our framework, we deploy it to the problem of fair gerrymandering where it achieves a better fairness-accuracy trade-off than the existing baseline method
    corecore