42 research outputs found

    The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning

    Full text link
    The nascent field of fair machine learning aims to ensure that decisions guided by algorithms are equitable. Over the last several years, three formal definitions of fairness have gained prominence: (1) anti-classification, meaning that protected attributes---like race, gender, and their proxies---are not explicitly used to make decisions; (2) classification parity, meaning that common measures of predictive performance (e.g., false positive and false negative rates) are equal across groups defined by the protected attributes; and (3) calibration, meaning that conditional on risk estimates, outcomes are independent of protected attributes. Here we show that all three of these fairness definitions suffer from significant statistical limitations. Requiring anti-classification or classification parity can, perversely, harm the very groups they were designed to protect; and calibration, though generally desirable, provides little guarantee that decisions are equitable. In contrast to these formal fairness criteria, we argue that it is often preferable to treat similarly risky people similarly, based on the most statistically accurate estimates of risk that one can produce. Such a strategy, while not universally applicable, often aligns well with policy objectives; notably, this strategy will typically violate both anti-classification and classification parity. In practice, it requires significant effort to construct suitable risk estimates. One must carefully define and measure the targets of prediction to avoid retrenching biases in the data. But, importantly, one cannot generally address these difficulties by requiring that algorithms satisfy popular mathematical formalizations of fairness. By highlighting these challenges in the foundation of fair machine learning, we hope to help researchers and practitioners productively advance the area

    Matched Pair Calibration for Ranking Fairness

    Full text link
    We propose a test of fairness in score-based ranking systems called matched pair calibration. Our approach constructs a set of matched item pairs with minimal confounding differences between subgroups before computing an appropriate measure of ranking error over the set. The matching step ensures that we compare subgroup outcomes between identically scored items so that measured performance differences directly imply unfairness in subgroup-level exposures. We show how our approach generalizes the fairness intuitions of calibration from a binary classification setting to ranking and connect our approach to other proposals for ranking fairness measures. Moreover, our strategy shows how the logic of marginal outcome tests extends to cases where the analyst has access to model scores. Lastly, we provide an example of applying matched pair calibration to a real-word ranking data set to demonstrate its efficacy in detecting ranking bias.Comment: 19 pages, 8 figure

    POTs: Protective Optimization Technologies

    Full text link
    Algorithmic fairness aims to address the economic, moral, social, and political impact that digital systems have on populations through solutions that can be applied by service providers. Fairness frameworks do so, in part, by mapping these problems to a narrow definition and assuming the service providers can be trusted to deploy countermeasures. Not surprisingly, these decisions limit fairness frameworks' ability to capture a variety of harms caused by systems. We characterize fairness limitations using concepts from requirements engineering and from social sciences. We show that the focus on algorithms' inputs and outputs misses harms that arise from systems interacting with the world; that the focus on bias and discrimination omits broader harms on populations and their environments; and that relying on service providers excludes scenarios where they are not cooperative or intentionally adversarial. We propose Protective Optimization Technologies (POTs). POTs provide means for affected parties to address the negative impacts of systems in the environment, expanding avenues for political contestation. POTs intervene from outside the system, do not require service providers to cooperate, and can serve to correct, shift, or expose harms that systems impose on populations and their environments. We illustrate the potential and limitations of POTs in two case studies: countering road congestion caused by traffic-beating applications, and recalibrating credit scoring for loan applicants.Comment: Appears in Conference on Fairness, Accountability, and Transparency (FAT* 2020). Bogdan Kulynych and Rebekah Overdorf contributed equally to this work. Version v1/v2 by Seda G\"urses, Rebekah Overdorf, and Ero Balsa was presented at HotPETS 2018 and at PiMLAI 201

    The Changing Landscape for Stroke\ua0Prevention in AF: Findings From the GLORIA-AF Registry Phase 2

    Get PDF
    Background GLORIA-AF (Global Registry on Long-Term Oral Antithrombotic Treatment in Patients with Atrial Fibrillation) is a prospective, global registry program describing antithrombotic treatment patterns in patients with newly diagnosed nonvalvular atrial fibrillation at risk of stroke. Phase 2 began when dabigatran, the first non\u2013vitamin K antagonist oral anticoagulant (NOAC), became available. Objectives This study sought to describe phase 2 baseline data and compare these with the pre-NOAC era collected during phase 1. Methods During phase 2, 15,641 consenting patients were enrolled (November 2011 to December 2014); 15,092 were eligible. This pre-specified cross-sectional analysis describes eligible patients\u2019 baseline characteristics. Atrial fibrillation disease characteristics, medical outcomes, and concomitant diseases and medications were collected. Data were analyzed using descriptive statistics. Results Of the total patients, 45.5% were female; median age was 71 (interquartile range: 64, 78) years. Patients were from Europe (47.1%), North America (22.5%), Asia (20.3%), Latin America (6.0%), and the Middle East/Africa (4.0%). Most had high stroke risk (CHA2DS2-VASc [Congestive heart failure, Hypertension, Age  6575 years, Diabetes mellitus, previous Stroke, Vascular disease, Age 65 to 74 years, Sex category] score  652; 86.1%); 13.9% had moderate risk (CHA2DS2-VASc = 1). Overall, 79.9% received oral anticoagulants, of whom 47.6% received NOAC and 32.3% vitamin K antagonists (VKA); 12.1% received antiplatelet agents; 7.8% received no antithrombotic treatment. For comparison, the proportion of phase 1 patients (of N = 1,063 all eligible) prescribed VKA was 32.8%, acetylsalicylic acid 41.7%, and no therapy 20.2%. In Europe in phase 2, treatment with NOAC was more common than VKA (52.3% and 37.8%, respectively); 6.0% of patients received antiplatelet treatment; and 3.8% received no antithrombotic treatment. In North America, 52.1%, 26.2%, and 14.0% of patients received NOAC, VKA, and antiplatelet drugs, respectively; 7.5% received no antithrombotic treatment. NOAC use was less common in Asia (27.7%), where 27.5% of patients received VKA, 25.0% antiplatelet drugs, and 19.8% no antithrombotic treatment. Conclusions The baseline data from GLORIA-AF phase 2 demonstrate that in newly diagnosed nonvalvular atrial fibrillation patients, NOAC have been highly adopted into practice, becoming more frequently prescribed than VKA in Europe and North America. Worldwide, however, a large proportion of patients remain undertreated, particularly in Asia and North America. (Global Registry on Long-Term Oral Antithrombotic Treatment in Patients With Atrial Fibrillation [GLORIA-AF]; NCT01468701

    Fludarabine, cytarabine, granulocyte colony-stimulating factor, and idarubicin with gemtuzumab ozogamicin improves event-free survival in younger patients with newly diagnosed aml and overall survival in patients with npm1 and flt3 mutations

    Get PDF
    Purpose To determine the optimal induction chemotherapy regimen for younger adults with newly diagnosed AML without known adverse risk cytogenetics. Patients and Methods One thousand thirty-three patients were randomly assigned to intensified (fludarabine, cytarabine, granulocyte colony-stimulating factor, and idarubicin [FLAG-Ida]) or standard (daunorubicin and Ara-C [DA]) induction chemotherapy, with one or two doses of gemtuzumab ozogamicin (GO). The primary end point was overall survival (OS). Results There was no difference in remission rate after two courses between FLAG-Ida + GO and DA + GO (complete remission [CR] + CR with incomplete hematologic recovery 93% v 91%) or in day 60 mortality (4.3% v 4.6%). There was no difference in OS (66% v 63%; P = .41); however, the risk of relapse was lower with FLAG-Ida + GO (24% v 41%; P < .001) and 3-year event-free survival was higher (57% v 45%; P < .001). In patients with an NPM1 mutation (30%), 3-year OS was significantly higher with FLAG-Ida + GO (82% v 64%; P = .005). NPM1 measurable residual disease (MRD) clearance was also greater, with 88% versus 77% becoming MRD-negative in peripheral blood after cycle 2 (P = .02). Three-year OS was also higher in patients with a FLT3 mutation (64% v 54%; P = .047). Fewer transplants were performed in patients receiving FLAG-Ida + GO (238 v 278; P = .02). There was no difference in outcome according to the number of GO doses, although NPM1 MRD clearance was higher with two doses in the DA arm. Patients with core binding factor AML treated with DA and one dose of GO had a 3-year OS of 96% with no survival benefit from FLAG-Ida + GO. Conclusion Overall, FLAG-Ida + GO significantly reduced relapse without improving OS. However, exploratory analyses show that patients with NPM1 and FLT3 mutations had substantial improvements in OS. By contrast, in patients with core binding factor AML, outcomes were excellent with DA + GO with no FLAG-Ida benefit
    corecore