42 research outputs found
The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning
The nascent field of fair machine learning aims to ensure that decisions
guided by algorithms are equitable. Over the last several years, three formal
definitions of fairness have gained prominence: (1) anti-classification,
meaning that protected attributes---like race, gender, and their proxies---are
not explicitly used to make decisions; (2) classification parity, meaning that
common measures of predictive performance (e.g., false positive and false
negative rates) are equal across groups defined by the protected attributes;
and (3) calibration, meaning that conditional on risk estimates, outcomes are
independent of protected attributes. Here we show that all three of these
fairness definitions suffer from significant statistical limitations. Requiring
anti-classification or classification parity can, perversely, harm the very
groups they were designed to protect; and calibration, though generally
desirable, provides little guarantee that decisions are equitable. In contrast
to these formal fairness criteria, we argue that it is often preferable to
treat similarly risky people similarly, based on the most statistically
accurate estimates of risk that one can produce. Such a strategy, while not
universally applicable, often aligns well with policy objectives; notably, this
strategy will typically violate both anti-classification and classification
parity. In practice, it requires significant effort to construct suitable risk
estimates. One must carefully define and measure the targets of prediction to
avoid retrenching biases in the data. But, importantly, one cannot generally
address these difficulties by requiring that algorithms satisfy popular
mathematical formalizations of fairness. By highlighting these challenges in
the foundation of fair machine learning, we hope to help researchers and
practitioners productively advance the area
Matched Pair Calibration for Ranking Fairness
We propose a test of fairness in score-based ranking systems called matched
pair calibration. Our approach constructs a set of matched item pairs with
minimal confounding differences between subgroups before computing an
appropriate measure of ranking error over the set. The matching step ensures
that we compare subgroup outcomes between identically scored items so that
measured performance differences directly imply unfairness in subgroup-level
exposures. We show how our approach generalizes the fairness intuitions of
calibration from a binary classification setting to ranking and connect our
approach to other proposals for ranking fairness measures. Moreover, our
strategy shows how the logic of marginal outcome tests extends to cases where
the analyst has access to model scores. Lastly, we provide an example of
applying matched pair calibration to a real-word ranking data set to
demonstrate its efficacy in detecting ranking bias.Comment: 19 pages, 8 figure
POTs: Protective Optimization Technologies
Algorithmic fairness aims to address the economic, moral, social, and
political impact that digital systems have on populations through solutions
that can be applied by service providers. Fairness frameworks do so, in part,
by mapping these problems to a narrow definition and assuming the service
providers can be trusted to deploy countermeasures. Not surprisingly, these
decisions limit fairness frameworks' ability to capture a variety of harms
caused by systems.
We characterize fairness limitations using concepts from requirements
engineering and from social sciences. We show that the focus on algorithms'
inputs and outputs misses harms that arise from systems interacting with the
world; that the focus on bias and discrimination omits broader harms on
populations and their environments; and that relying on service providers
excludes scenarios where they are not cooperative or intentionally adversarial.
We propose Protective Optimization Technologies (POTs). POTs provide means
for affected parties to address the negative impacts of systems in the
environment, expanding avenues for political contestation. POTs intervene from
outside the system, do not require service providers to cooperate, and can
serve to correct, shift, or expose harms that systems impose on populations and
their environments. We illustrate the potential and limitations of POTs in two
case studies: countering road congestion caused by traffic-beating
applications, and recalibrating credit scoring for loan applicants.Comment: Appears in Conference on Fairness, Accountability, and Transparency
(FAT* 2020). Bogdan Kulynych and Rebekah Overdorf contributed equally to this
work. Version v1/v2 by Seda G\"urses, Rebekah Overdorf, and Ero Balsa was
presented at HotPETS 2018 and at PiMLAI 201
The Changing Landscape for Stroke\ua0Prevention in AF: Findings From the GLORIA-AF Registry Phase 2
Background GLORIA-AF (Global Registry on Long-Term Oral Antithrombotic Treatment in Patients with Atrial Fibrillation) is a prospective, global registry program describing antithrombotic treatment patterns in patients with newly diagnosed nonvalvular atrial fibrillation at risk of stroke. Phase 2 began when dabigatran, the first non\u2013vitamin K antagonist oral anticoagulant (NOAC), became available. Objectives This study sought to describe phase 2 baseline data and compare these with the pre-NOAC era collected during phase 1. Methods During phase 2, 15,641 consenting patients were enrolled (November 2011 to December 2014); 15,092 were eligible. This pre-specified cross-sectional analysis describes eligible patients\u2019 baseline characteristics. Atrial fibrillation disease characteristics, medical outcomes, and concomitant diseases and medications were collected. Data were analyzed using descriptive statistics. Results Of the total patients, 45.5% were female; median age was 71 (interquartile range: 64, 78) years. Patients were from Europe (47.1%), North America (22.5%), Asia (20.3%), Latin America (6.0%), and the Middle East/Africa (4.0%). Most had high stroke risk (CHA2DS2-VASc [Congestive heart failure, Hypertension, Age 6575 years, Diabetes mellitus, previous Stroke, Vascular disease, Age 65 to 74 years, Sex category] score 652; 86.1%); 13.9% had moderate risk (CHA2DS2-VASc = 1). Overall, 79.9% received oral anticoagulants, of whom 47.6% received NOAC and 32.3% vitamin K antagonists (VKA); 12.1% received antiplatelet agents; 7.8% received no antithrombotic treatment. For comparison, the proportion of phase 1 patients (of N = 1,063 all eligible) prescribed VKA was 32.8%, acetylsalicylic acid 41.7%, and no therapy 20.2%. In Europe in phase 2, treatment with NOAC was more common than VKA (52.3% and 37.8%, respectively); 6.0% of patients received antiplatelet treatment; and 3.8% received no antithrombotic treatment. In North America, 52.1%, 26.2%, and 14.0% of patients received NOAC, VKA, and antiplatelet drugs, respectively; 7.5% received no antithrombotic treatment. NOAC use was less common in Asia (27.7%), where 27.5% of patients received VKA, 25.0% antiplatelet drugs, and 19.8% no antithrombotic treatment. Conclusions The baseline data from GLORIA-AF phase 2 demonstrate that in newly diagnosed nonvalvular atrial fibrillation patients, NOAC have been highly adopted into practice, becoming more frequently prescribed than VKA in Europe and North America. Worldwide, however, a large proportion of patients remain undertreated, particularly in Asia and North America. (Global Registry on Long-Term Oral Antithrombotic Treatment in Patients With Atrial Fibrillation [GLORIA-AF]; NCT01468701
Fludarabine, cytarabine, granulocyte colony-stimulating factor, and idarubicin with gemtuzumab ozogamicin improves event-free survival in younger patients with newly diagnosed aml and overall survival in patients with npm1 and flt3 mutations
Purpose
To determine the optimal induction chemotherapy regimen for younger adults with newly diagnosed AML without known adverse risk cytogenetics.
Patients and Methods
One thousand thirty-three patients were randomly assigned to intensified (fludarabine, cytarabine, granulocyte colony-stimulating factor, and idarubicin [FLAG-Ida]) or standard (daunorubicin and Ara-C [DA]) induction chemotherapy, with one or two doses of gemtuzumab ozogamicin (GO). The primary end point was overall survival (OS).
Results
There was no difference in remission rate after two courses between FLAG-Ida + GO and DA + GO (complete remission [CR] + CR with incomplete hematologic recovery 93% v 91%) or in day 60 mortality (4.3% v 4.6%). There was no difference in OS (66% v 63%; P = .41); however, the risk of relapse was lower with FLAG-Ida + GO (24% v 41%; P < .001) and 3-year event-free survival was higher (57% v 45%; P < .001). In patients with an NPM1 mutation (30%), 3-year OS was significantly higher with FLAG-Ida + GO (82% v 64%; P = .005). NPM1 measurable residual disease (MRD) clearance was also greater, with 88% versus 77% becoming MRD-negative in peripheral blood after cycle 2 (P = .02). Three-year OS was also higher in patients with a FLT3 mutation (64% v 54%; P = .047). Fewer transplants were performed in patients receiving FLAG-Ida + GO (238 v 278; P = .02). There was no difference in outcome according to the number of GO doses, although NPM1 MRD clearance was higher with two doses in the DA arm. Patients with core binding factor AML treated with DA and one dose of GO had a 3-year OS of 96% with no survival benefit from FLAG-Ida + GO.
Conclusion
Overall, FLAG-Ida + GO significantly reduced relapse without improving OS. However, exploratory analyses show that patients with NPM1 and FLT3 mutations had substantial improvements in OS. By contrast, in patients with core binding factor AML, outcomes were excellent with DA + GO with no FLAG-Ida benefit