339 research outputs found
Achieving Differential Privacy and Fairness in Machine Learning
Machine learning algorithms are used to make decisions in various applications, such as recruiting, lending and policing. These algorithms rely on large amounts of sensitive individual information to work properly. Hence, there are sociological concerns about machine learning algorithms on matters like privacy and fairness. Currently, many studies only focus on protecting individual privacy or ensuring fairness of algorithms separately without taking consideration of their connection. However, there are new challenges arising in privacy preserving and fairness-aware machine learning. On one hand, there is fairness within the private model, i.e., how to meet both privacy and fairness requirements simultaneously in machine learning algorithms. On the other hand, there is fairness between the private model and the non-private model, i.e., how to ensure the utility loss due to differential privacy is the same towards each group.
The goal of this dissertation is to address challenging issues in privacy preserving and fairness-aware machine learning: achieving differential privacy with satisfactory utility and efficiency in complex and emerging tasks, using generative models to generate fair data and to assist fair classification, achieving both differential privacy and fairness simultaneously within the same model, and achieving equal utility loss w.r.t. each group between the private model and the non-private model.
In this dissertation, we develop the following algorithms to address the above challenges.
(1) We develop PrivPC and DPNE algorithms to achieve differential privacy in complex and emerging tasks of causal graph discovery and network embedding, respectively.
(2) We develop the fair generative adversarial neural networks framework and three algorithms (FairGAN, FairGAN+ and CFGAN) to achieve fair data generation and classification through generative models based on different association-based and causation-based fairness notions.
(3) We develop PFLR and PFLR* algorithms to simultaneously achieve both differential privacy and fairness in logistic regression.
(4) We develop a DPSGD-F algorithm to remove the disparate impact of differential privacy on model accuracy w.r.t. each group
FairDP: Certified Fairness with Differential Privacy
This paper introduces FairDP, a novel mechanism designed to achieve certified
fairness with differential privacy (DP). FairDP independently trains models for
distinct individual groups, using group-specific clipping terms to assess and
bound the disparate impacts of DP. Throughout the training process, the
mechanism progressively integrates knowledge from group models to formulate a
comprehensive model that balances privacy, utility, and fairness in downstream
tasks. Extensive theoretical and empirical analyses validate the efficacy of
FairDP and improved trade-offs between model utility, privacy, and fairness
compared with existing methods
Stochastic Differentially Private and Fair Learning
Machine learning models are increasingly used in high-stakes decision-making
systems. In such applications, a major concern is that these models sometimes
discriminate against certain demographic groups such as individuals with
certain race, gender, or age. Another major concern in these applications is
the violation of the privacy of users. While fair learning algorithms have been
developed to mitigate discrimination issues, these algorithms can still leak
sensitive information, such as individuals' health or financial records.
Utilizing the notion of differential privacy (DP), prior works aimed at
developing learning algorithms that are both private and fair. However,
existing algorithms for DP fair learning are either not guaranteed to converge
or require full batch of data in each iteration of the algorithm to converge.
In this paper, we provide the first stochastic differentially private algorithm
for fair learning that is guaranteed to converge. Here, the term "stochastic"
refers to the fact that our proposed algorithm converges even when minibatches
of data are used at each iteration (i.e. stochastic optimization). Our
framework is flexible enough to permit different fairness notions, including
demographic parity and equalized odds. In addition, our algorithm can be
applied to non-binary classification tasks with multiple (non-binary) sensitive
attributes. As a byproduct of our convergence analysis, we provide the first
utility guarantee for a DP algorithm for solving nonconvex-strongly concave
min-max problems. Our numerical experiments show that the proposed algorithm
consistently offers significant performance gains over the state-of-the-art
baselines, and can be applied to larger scale problems with non-binary
target/sensitive attributes.Comment: ICLR 202
Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent
Differentially private stochastic gradient descent (DP-SGD) is the workhorse
algorithm for recent advances in private deep learning. It provides a single
privacy guarantee to all datapoints in the dataset. We propose output-specific
-DP to characterize privacy guarantees for individual
examples when releasing models trained by DP-SGD. We also design an efficient
algorithm to investigate individual privacy across a number of datasets. We
find that most examples enjoy stronger privacy guarantees than the worst-case
bound. We further discover that the training loss and the privacy parameter of
an example are well-correlated. This implies groups that are underserved in
terms of model utility simultaneously experience weaker privacy guarantees. For
example, on CIFAR-10, the average of the class with the lowest
test accuracy is 44.2\% higher than that of the class with the highest
accuracy.Comment: Published in Transactions on Machine Learning Research (TMLR
Directional Privacy for Deep Learning
Differentially Private Stochastic Gradient Descent (DP-SGD) is a key method
for applying privacy in the training of deep learning models. This applies
isotropic Gaussian noise to gradients during training, which can perturb these
gradients in any direction, damaging utility. Metric DP, however, can provide
alternative mechanisms based on arbitrary metrics that might be more suitable.
In this paper we apply \textit{directional privacy}, via a mechanism based on
the von Mises-Fisher (VMF) distribution, to perturb gradients in terms of
\textit{angular distance} so that gradient direction is broadly preserved. We
show that this provides -privacy for deep learning training, rather
than the -privacy of the Gaussian mechanism; and that
experimentally, on key datasets, the VMF mechanism can outperform the Gaussian
in the utility-privacy trade-off
THE INTERPLAY BETWEEN PRIVACY AND FAIRNESS IN LEARNING AND DECISION MAKING PROBLEMS
The availability of large datasets and computational resources has driven significant progress in Artificial Intelligence (AI) and, especially,Machine Learning (ML). These advances have rendered AI systems instrumental for many decision making and policy operations involving individuals: they include assistance in legal decisions, lending, and hiring, as well determinations of resources and benefits, all of which have profound social and economic impacts. While data-driven systems have been successful in an increasing number of tasks, the use of rich datasets, combined with the adoption of black-box algorithms, has sparked concerns about how these systems operate. How much information these systems leak about the individuals whose data is used as input and how they handle biases and fairness issues are two of these critical concerns. While some people argue that privacy and fairness are in alignment, the majority instead believe these are two contrasting metrics. This thesis firstly studies the interaction between privacy and fairness in machine learning and decision problems. It focuses on the scenario when fairness and privacy are at odds and investigates different factors that can explain for such behaviors. It then proposes effective and efficient mitigation solutions to improve fairness under privacy constraints. In the second part, it analyzes the connection between fairness and other machine learning concepts such as model compression and adversarial robustness. Finally, it introduces a novel privacy concept and an initial implementation to protect such proposed users privacy at inference time
- …