88,727 research outputs found
The Cost of a Reductions Approach to Private Fair Optimization
We examine a reductions approach to fair optimization and learning where a
black-box optimizer is used to learn a fair model for classification or
regression [Alabi et al., 2018, Agarwal et al., 2018] and explore the creation
of such fair models that adhere to data privacy guarantees (specifically
differential privacy). For this approach, we consider two suites of use cases:
the first is for optimizing convex performance measures of the confusion matrix
(such as those derived from the -mean and -mean); the second is for
satisfying statistical definitions of algorithmic fairness (such as equalized
odds, demographic parity, and the gini index of inequality). The reductions
approach to fair optimization can be abstracted as the constrained
group-objective optimization problem where we aim to optimize an objective that
is a function of losses of individual groups, subject to some constraints. We
present two generic differentially private algorithms to solve this problem: an
exponential sampling algorithm and an
algorithm that uses an approximate linear optimizer to incrementally move
toward the best decision. Compared to a previous method for ensuring
differential privacy subject to a relaxed form of the equalized odds fairness
constraint, the differentially private algorithm we
present provides asymptotically better sample complexity guarantees in certain
parameter regimes. The technique of using an approximate linear optimizer
oracle to achieve privacy might be applicable to other problems not considered
in this paper. Finally, we show an algorithm-agnostic information-theoretic
lower bound on the excess risk (or equivalently, the sample complexity) of any
solution to the problem of or private
constrained group-objective optimization
Fair Regression: Quantitative Definitions and Reduction-based Algorithms
In this paper, we study the prediction of a real-valued target, such as a
risk score or recidivism rate, while guaranteeing a quantitative notion of
fairness with respect to a protected attribute such as gender or race. We call
this class of problems \emph{fair regression}. We propose general schemes for
fair regression under two notions of fairness: (1) statistical parity, which
asks that the prediction be statistically independent of the protected
attribute, and (2) bounded group loss, which asks that the prediction error
restricted to any protected group remain below some pre-determined level. While
we only study these two notions of fairness, our schemes are applicable to
arbitrary Lipschitz-continuous losses, and so they encompass least-squares
regression, logistic regression, quantile regression, and many other tasks. Our
schemes only require access to standard risk minimization algorithms (such as
standard classification or least-squares regression) while providing
theoretical guarantees on the optimality and fairness of the obtained
solutions. In addition to analyzing theoretical properties of our schemes, we
empirically demonstrate their ability to uncover fairness--accuracy frontiers
on several standard datasets
Computational Social Choice and Computational Complexity: BFFs?
We discuss the connection between computational social choice (comsoc) and
computational complexity. We stress the work so far on, and urge continued
focus on, two less-recognized aspects of this connection. Firstly, this is very
much a two-way street: Everyone knows complexity classification is used in
comsoc, but we also highlight benefits to complexity that have arisen from its
use in comsoc. Secondly, more subtle, less-known complexity tools often can be
very productively used in comsoc.Comment: A version of this paper will appear in AAAI-1
Average Individual Fairness: Algorithms, Generalization and Experiments
We propose a new family of fairness definitions for classification problems
that combine some of the best properties of both statistical and individual
notions of fairness. We posit not only a distribution over individuals, but
also a distribution over (or collection of) classification tasks. We then ask
that standard statistics (such as error or false positive/negative rates) be
(approximately) equalized across individuals, where the rate is defined as an
expectation over the classification tasks. Because we are no longer averaging
over coarse groups (such as race or gender), this is a semantically meaningful
individual-level constraint. Given a sample of individuals and classification
problems, we design an oracle-efficient algorithm (i.e. one that is given
access to any standard, fairness-free learning heuristic) for the fair
empirical risk minimization task. We also show that given sufficiently many
samples, the ERM solution generalizes in two directions: both to new
individuals, and to new classification tasks, drawn from their corresponding
distributions. Finally we implement our algorithm and empirically verify its
effectiveness
What's in a Name? Reducing Bias in Bios without Access to Protected Attributes
There is a growing body of work that proposes methods for mitigating bias in
machine learning systems. These methods typically rely on access to protected
attributes such as race, gender, or age. However, this raises two significant
challenges: (1) protected attributes may not be available or it may not be
legal to use them, and (2) it is often desirable to simultaneously consider
multiple protected attributes, as well as their intersections. In the context
of mitigating bias in occupation classification, we propose a method for
discouraging correlation between the predicted probability of an individual's
true occupation and a word embedding of their name. This method leverages the
societal biases that are encoded in word embeddings, eliminating the need for
access to protected attributes. Crucially, it only requires access to
individuals' names at training time and not at deployment time. We evaluate two
variations of our proposed method using a large-scale dataset of online
biographies. We find that both variations simultaneously reduce race and gender
biases, with almost no reduction in the classifier's overall true positive
rate.Comment: Accepted at NAACL 2019; Best Thematic Pape
Taking Advantage of Multitask Learning for Fair Classification
A central goal of algorithmic fairness is to reduce bias in automated
decision making. An unavoidable tension exists between accuracy gains obtained
by using sensitive information (e.g., gender or ethnic group) as part of a
statistical model, and any commitment to protect these characteristics. Often,
due to biases present in the data, using the sensitive information in the
functional form of a classifier improves classification accuracy. In this paper
we show how it is possible to get the best of both worlds: optimize model
accuracy and fairness without explicitly using the sensitive feature in the
functional form of the model, thereby treating different individuals equally.
Our method is based on two key ideas. On the one hand, we propose to use
Multitask Learning (MTL), enhanced with fairness constraints, to jointly learn
group specific classifiers that leverage information between sensitive groups.
On the other hand, since learning group specific models might not be permitted,
we propose to first predict the sensitive features by any learning method and
then to use the predicted sensitive feature to train MTL with fairness
constraints. This enables us to tackle fairness with a three-pronged approach,
that is, by increasing accuracy on each group, enforcing measures of fairness
during training, and protecting sensitive information during testing.
Experimental results on two real datasets support our proposal, showing
substantial improvements in both accuracy and fairness
Fair Regression for Health Care Spending
The distribution of health care payments to insurance plans has substantial
consequences for social policy. Risk adjustment formulas predict spending in
health insurance markets in order to provide fair benefits and health care
coverage for all enrollees, regardless of their health status. Unfortunately,
current risk adjustment formulas are known to underpredict spending for
specific groups of enrollees leading to undercompensated payments to health
insurers. This incentivizes insurers to design their plans such that
individuals in undercompensated groups will be less likely to enroll, impacting
access to health care for these groups. To improve risk adjustment formulas for
undercompensated groups, we expand on concepts from the statistics, computer
science, and health economics literature to develop new fair regression methods
for continuous outcomes by building fairness considerations directly into the
objective function. We additionally propose a novel measure of fairness while
asserting that a suite of metrics is necessary in order to evaluate risk
adjustment formulas more fully. Our data application using the IBM MarketScan
Research Databases and simulation studies demonstrate that these new fair
regression methods may lead to massive improvements in group fairness (e.g.,
98%) with only small reductions in overall fit (e.g., 4%).Comment: 30 pages, 3 figure
Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness
The most prevalent notions of fairness in machine learning are statistical
definitions: they fix a small collection of pre-defined groups, and then ask
for parity of some statistic of the classifier across these groups. Constraints
of this form are susceptible to intentional or inadvertent "fairness
gerrymandering", in which a classifier appears to be fair on each individual
group, but badly violates the fairness constraint on one or more structured
subgroups defined over the protected attributes. We propose instead to demand
statistical notions of fairness across exponentially (or infinitely) many
subgroups, defined by a structured class of functions over the protected
attributes. This interpolates between statistical definitions of fairness and
recently proposed individual notions of fairness, but raises several
computational challenges. It is no longer clear how to audit a fixed classifier
to see if it satisfies such a strong definition of fairness. We prove that the
computational problem of auditing subgroup fairness for both equality of false
positive rates and statistical parity is equivalent to the problem of weak
agnostic learning, which means it is computationally hard in the worst case,
even for simple structured subclasses.
We then derive two algorithms that provably converge to the best fair
classifier, given access to oracles which can solve the agnostic learning
problem. The algorithms are based on a formulation of subgroup fairness as a
two-player zero-sum game between a Learner and an Auditor. Our first algorithm
provably converges in a polynomial number of steps. Our second algorithm enjoys
only provably asymptotic convergence, but has the merit of simplicity and
faster per-step computation. We implement the simpler algorithm using linear
regression as a heuristic oracle, and show that we can effectively both audit
and learn fair classifiers on real datasets.Comment: Added new experimental results and a slightly modified fairness
definitio
AccurateML: Information-aggregation-based Approximate Processing for Fast and Accurate Machine Learning on MapReduce
The growing demands of processing massive datasets have promoted irresistible
trends of running machine learning applications on MapReduce. When processing
large input data, it is often of greater values to produce fast and accurate
enough approximate results than slow exact results. Existing techniques produce
approximate results by processing parts of the input data, thus incurring large
accuracy losses when using short job execution times, because all the skipped
input data potentially contributes to result accuracy. We address this
limitation by proposing AccurateML that aggregates information of input data in
each map task to create small aggregated data points. These aggregated points
enable all map tasks producing initial outputs quickly to save computation
times and decrease the outputs' size to reduce communication times. Our
approach further identifies the parts of input data most related to result
accuracy, thus first using these parts to improve the produced outputs to
minimize accuracy losses. We evaluated AccurateML using real machine learning
applications and datasets. The results show: (i) it reduces execution times by
30 times with small accuracy losses compared to exact results; (ii) when using
the same execution times, it achieves 2.71 times reductions in accuracy losses
compared to existing approximate processing techniques.Comment: 9 pages, 9 figure
Fair Resource Allocation in Federated Learning
Federated learning involves training statistical models in massive,
heterogeneous networks. Naively minimizing an aggregate loss function in such a
network may disproportionately advantage or disadvantage some of the devices.
In this work, we propose q-Fair Federated Learning (q-FFL), a novel
optimization objective inspired by fair resource allocation in wireless
networks that encourages a more fair (specifically, a more uniform) accuracy
distribution across devices in federated networks. To solve q-FFL, we devise a
communication-efficient method, q-FedAvg, that is suited to federated networks.
We validate both the effectiveness of q-FFL and the efficiency of q-FedAvg on a
suite of federated datasets with both convex and non-convex models, and show
that q-FFL (along with q-FedAvg) outperforms existing baselines in terms of the
resulting fairness, flexibility, and efficiency.Comment: ICLR 202
- …