13 research outputs found
Balanced Filtering via Non-Disclosive Proxies
We study the problem of non-disclosively collecting a sample of data that is
balanced with respect to sensitive groups when group membership is unavailable
or prohibited from use at collection time. Specifically, our collection
mechanism does not reveal significantly more about group membership of any
individual sample than can be ascertained from base rates alone. To do this, we
adopt a fairness pipeline perspective, in which a learner can use a small set
of labeled data to train a proxy function that can later be used for this
filtering task. We then associate the range of the proxy function with sampling
probabilities; given a new candidate, we classify it using our proxy function,
and then select it for our sample with probability proportional to the sampling
probability corresponding to its proxy classification. Importantly, we require
that the proxy classification itself not reveal significant information about
the sensitive group membership of any individual sample (i.e., it should be
sufficiently non-disclosive). We show that under modest algorithmic
assumptions, we find such a proxy in a sample- and oracle-efficient manner.
Finally, we experimentally evaluate our algorithm and analyze generalization
properties
Estimating and Controlling for Fairness via Sensitive Attribute Predictors
The responsible use of machine learning tools in real world high-stakes
decision making demands that we audit and control for potential biases against
underrepresented groups. This process naturally requires access to the
sensitive attribute one desires to control, such as demographics, gender, or
other potentially sensitive features. Unfortunately, this information is often
unavailable. In this work we demonstrate that one can still reliably estimate,
and ultimately control, for fairness by using proxy sensitive attributes
derived from a sensitive attribute predictor. Specifically, we first show that
with just a little knowledge of the complete data distribution, one may use a
sensitive attribute predictor to obtain bounds of the classifier's true
fairness metric. Second, we demonstrate how one can provably control a
classifier's worst-case fairness violation with respect to the true sensitive
attribute by controlling for fairness with respect to the proxy sensitive
attribute. Our results hold under assumptions that are significantly milder
than previous works, and we illustrate these results with experiments on
synthetic and real datasets
Distributionally Robust Data Join
Suppose we are given two datasets: a labeled dataset and unlabeled dataset which also has additional auxiliary features not present in the first dataset. What is the most principled way to use these datasets together to construct a predictor?
The answer should depend upon whether these datasets are generated by the same or different distributions over their mutual feature sets, and how similar the test distribution will be to either of those distributions. In many applications, the two datasets will likely follow different distributions, but both may be close to the test distribution. We introduce the problem of building a predictor which minimizes the maximum loss over all probability distributions over the original features, auxiliary features, and binary labels, whose Wasserstein distance is r? away from the empirical distribution over the labeled dataset and r? away from that of the unlabeled dataset. This can be thought of as a generalization of distributionally robust optimization (DRO), which allows for two data sources, one of which is unlabeled and may contain auxiliary features
BISG: When inferring race or ethnicity, does it matter that people often live near their relatives?
Bayesian Improved Surname Geocoding (BISG) is a ubiquitous tool for
predicting race and ethnicity using an individual's geolocation and surname.
BISG assumes that in the United States population, surname and geolocation are
independent given a particular race or ethnicity. This assumption appears to
contradict conventional wisdom including that people often live near their
relatives (with the same surname and race). We demonstrate that this
independence assumption results in systematic biases for minority
subpopulations and we introduce a simple alternative to BISG. Our raking-based
prediction algorithm offers a significant improvement over BISG and we validate
our algorithm on states' voter registration lists that contain self-identified
race/ethnicity. The proposed improvement and the inaccuracies of BISG
generalize to applications in election law, health care, finance, tech, law
enforcement and many other fields
Federated Fairness without Access to Sensitive Groups
Current approaches to group fairness in federated learning assume the
existence of predefined and labeled sensitive groups during training. However,
due to factors ranging from emerging regulations to dynamics and
location-dependency of protected groups, this assumption may be unsuitable in
many real-world scenarios. In this work, we propose a new approach to guarantee
group fairness that does not rely on any predefined definition of sensitive
groups or additional labels. Our objective allows the federation to learn a
Pareto efficient global model ensuring worst-case group fairness and it
enables, via a single hyper-parameter, trade-offs between fairness and utility,
subject only to a group size constraint. This implies that any sufficiently
large subset of the population is guaranteed to receive at least a minimum
level of utility performance from the model. The proposed objective encompasses
existing approaches as special cases, such as empirical risk minimization and
subgroup robustness objectives from centralized machine learning. We provide an
algorithm to solve this problem in federation that enjoys convergence and
excess risk guarantees. Our empirical results indicate that the proposed
approach can effectively improve the worst-performing group that may be present
without unnecessarily hurting the average performance, exhibits superior or
comparable performance to relevant baselines, and achieves a large set of
solutions with different fairness-utility trade-offs
Robust Classification via Support Vector Machines
Classification models are very sensitive to data uncertainty, and finding robust classifiers that are less sensitive to data uncertainty has raised great interest in the machine learning literature. This paper aims to construct robust support vector machine classifiers under feature data uncertainty via two probabilistic arguments. The first classifier, Single Perturbation, reduces the local effect of data uncertainty with respect to one given feature and acts as a local test that could confirm or refute the presence of significant data uncertainty for that particular feature. The second classifier, Extreme Empirical Loss, aims to reduce the aggregate effect of data uncertainty with respect to all features, which is possible via a trade-off between the number of prediction model violations and the size of these violations. Both methodologies are computationally efficient and our extensive numerical investigation highlights the advantages and possible limitations of the two robust classifiers on synthetic and real-life insurance claims and mortgage lending data, but also the fairness of an automatized decision based on our classifier