24 research outputs found
Fairness and Privacy in Machine Learning Algorithms
Roughly 2.5 quintillion bytes of data is generated daily in this digital era. Manual processing of such huge amounts of data to extract useful information is nearly impossible but with the widespread use of machine learning algorithms and their ability to process enormous data in a fast, cost-effective, and scalable way has proven to be a preferred choice to glean useful insights and solve business problems in many domains. With this widespread use of machine learning algorithms there has always been concerns about the ethical issues that may arise from the use of this modern technology. While achieving high accuracies, accomplishing trustable and fair machine learning has been challenging. Maintaining data fairness and privacy is one of the top challenges faced by the industry as organizations employ various machine learning algorithms to automatically make decisions based on trends from previously collected data. Protected group or attribute refers to the group of individuals towards whom the system has some preconceived reservations and hence is discriminatory. Discrimination is the unjustified treatment towards a particular category of people based on their race, age, gender, religion, sexual orientation, or disability. If we use the data with preconceived reservation or inbuilt discrimination towards certain group, then the model trained on such data will also be discriminatory towards these specific individuals
Achieving Differential Privacy and Fairness in Machine Learning
Machine learning algorithms are used to make decisions in various applications, such as recruiting, lending and policing. These algorithms rely on large amounts of sensitive individual information to work properly. Hence, there are sociological concerns about machine learning algorithms on matters like privacy and fairness. Currently, many studies only focus on protecting individual privacy or ensuring fairness of algorithms separately without taking consideration of their connection. However, there are new challenges arising in privacy preserving and fairness-aware machine learning. On one hand, there is fairness within the private model, i.e., how to meet both privacy and fairness requirements simultaneously in machine learning algorithms. On the other hand, there is fairness between the private model and the non-private model, i.e., how to ensure the utility loss due to differential privacy is the same towards each group.
The goal of this dissertation is to address challenging issues in privacy preserving and fairness-aware machine learning: achieving differential privacy with satisfactory utility and efficiency in complex and emerging tasks, using generative models to generate fair data and to assist fair classification, achieving both differential privacy and fairness simultaneously within the same model, and achieving equal utility loss w.r.t. each group between the private model and the non-private model.
In this dissertation, we develop the following algorithms to address the above challenges.
(1) We develop PrivPC and DPNE algorithms to achieve differential privacy in complex and emerging tasks of causal graph discovery and network embedding, respectively.
(2) We develop the fair generative adversarial neural networks framework and three algorithms (FairGAN, FairGAN+ and CFGAN) to achieve fair data generation and classification through generative models based on different association-based and causation-based fairness notions.
(3) We develop PFLR and PFLR* algorithms to simultaneously achieve both differential privacy and fairness in logistic regression.
(4) We develop a DPSGD-F algorithm to remove the disparate impact of differential privacy on model accuracy w.r.t. each group
Causal Fairness-Guided Dataset Reweighting using Neural Networks
The importance of achieving fairness in machine learning models cannot be
overstated. Recent research has pointed out that fairness should be examined
from a causal perspective, and several fairness notions based on the on Pearl's
causal framework have been proposed. In this paper, we construct a reweighting
scheme of datasets to address causal fairness. Our approach aims at mitigating
bias by considering the causal relationships among variables and incorporating
them into the reweighting process. The proposed method adopts two neural
networks, whose structures are intentionally used to reflect the structures of
a causal graph and of an interventional graph. The two neural networks can
approximate the causal model of the data, and the causal model of
interventions. Furthermore, reweighting guided by a discriminator is applied to
achieve various fairness notions. Experiments on real-world datasets show that
our method can achieve causal fairness on the data while remaining close to the
original data for downstream tasks.Comment: To be published in the proceedings of 2023 IEEE International
Conference on Big Data (IEEE BigData 2023
A survey in fairness in classification based machine learning
Abstract. As the usage and impact of machine learning applications increase, it is increasingly important to ensure that the systems in use are beneficial to users and larger society around them. One of steps to ensure this is limiting unfairness that the algorithm might have. Existing machine learning applications have sometimes shown that they have been disadvantageous to certain minorities and to combat this we have a need for defining what does fairness means, and how can we increase it in our machine learning applications. The survey is done as a literary review with the goal of presenting an overview of fairness in classification-based machine learning. The survey goes through the motivation for fairness briefly through philosophical background and examples of unfairness and goes through the most popular fairness definitions in machine learning. After this the paper lists some of the most important methods for restricting unfairness splitting the methods into pre- in- and post-processing methods
Non-Imaging Medical Data Synthesis for Trustworthy AI: A Comprehensive Survey
Data quality is the key factor for the development of trustworthy AI in
healthcare. A large volume of curated datasets with controlled confounding
factors can help improve the accuracy, robustness and privacy of downstream AI
algorithms. However, access to good quality datasets is limited by the
technical difficulty of data acquisition and large-scale sharing of healthcare
data is hindered by strict ethical restrictions. Data synthesis algorithms,
which generate data with a similar distribution as real clinical data, can
serve as a potential solution to address the scarcity of good quality data
during the development of trustworthy AI. However, state-of-the-art data
synthesis algorithms, especially deep learning algorithms, focus more on
imaging data while neglecting the synthesis of non-imaging healthcare data,
including clinical measurements, medical signals and waveforms, and electronic
healthcare records (EHRs). Thus, in this paper, we will review the synthesis
algorithms, particularly for non-imaging medical data, with the aim of
providing trustworthy AI in this domain. This tutorial-styled review paper will
provide comprehensive descriptions of non-imaging medical data synthesis on
aspects including algorithms, evaluations, limitations and future research
directions.Comment: 35 pages, Submitted to ACM Computing Survey
Breaking the Spurious Causality of Conditional Generation via Fairness Intervention with Corrective Sampling
To capture the relationship between samples and labels, conditional
generative models often inherit spurious correlations from the training
dataset. This can result in label-conditional distributions that are imbalanced
with respect to another latent attribute. To mitigate this issue, which we call
spurious causality of conditional generation, we propose a general two-step
strategy. (a) Fairness Intervention (FI): emphasize the minority samples that
are hard to generate due to the spurious correlation in the training dataset.
(b) Corrective Sampling (CS): explicitly filter the generated samples and
ensure that they follow the desired latent attribute distribution. We have
designed the fairness intervention to work for various degrees of supervision
on the spurious attribute, including unsupervised, weakly-supervised, and
semi-supervised scenarios. Our experimental results demonstrate that FICS can
effectively resolve spurious causality of conditional generation across various
datasets.Comment: TMLR 202
Counterfactual Fairness for Predictions using Generative Adversarial Networks
Fairness in predictions is of direct importance in practice due to legal,
ethical, and societal reasons. It is often achieved through counterfactual
fairness, which ensures that the prediction for an individual is the same as
that in a counterfactual world under a different sensitive attribute. However,
achieving counterfactual fairness is challenging as counterfactuals are
unobservable. In this paper, we develop a novel deep neural network called
Generative Counterfactual Fairness Network (GCFN) for making predictions under
counterfactual fairness. Specifically, we leverage a tailored generative
adversarial network to directly learn the counterfactual distribution of the
descendants of the sensitive attribute, which we then use to enforce fair
predictions through a novel counterfactual mediator regularization. If the
counterfactual distribution is learned sufficiently well, our method is
mathematically guaranteed to ensure the notion of counterfactual fairness.
Thereby, our GCFN addresses key shortcomings of existing baselines that are
based on inferring latent variables, yet which (a) are potentially correlated
with the sensitive attributes and thus lead to bias, and (b) have weak
capability in constructing latent representations and thus low prediction
performance. Across various experiments, our method achieves state-of-the-art
performance. Using a real-world case study from recidivism prediction, we
further demonstrate that our method makes meaningful predictions in practice