5,603 research outputs found
Advances in privacy-preserving machine learning
Building useful predictive models often involves learning from personal data. For instance, companies use customer data to target advertisements, online education platforms collect student data to recommend content and improve user engagement, and medical researchers fit diagnostic models to patient data. A recent line of research aims to design learning algorithms that provide rigorous privacy guarantees for user data, in the sense that their outputs---models or predictions---leak as little information as possible about individuals in the training data. The goal of this dissertation is to design private learning algorithms with performance comparable to the best possible non-private ones. We quantify privacy using \emph{differential privacy}, a well-studied privacy notion that limits how much information is leaked about an individual by the output of an algorithm. Training a model using a differentially private algorithm prevents an adversary from confidently determining whether a specific person's data was used for training the model.
We begin by presenting a technique for practical differentially private convex optimization that can leverage any off-the-shelf optimizer as a black box. We also perform an extensive empirical evaluation of the state-of-the-art algorithms on a range of publicly available datasets, as well as in an industry application.
Next, we present a learning algorithm that outputs a private classifier when given black-box access to a non-private learner and a limited amount of unlabeled public data. We prove that the accuracy guarantee of our private algorithm in the PAC model of learning is comparable to that of the underlying non-private learner. Such a guarantee is not possible, in general, without public data.
Lastly, we consider building recommendation systems, which we model using matrix completion. We present the first algorithm for matrix completion with provable user-level privacy and accuracy guarantees. Our algorithm consistently outperforms the state-of-the-art private algorithms on a suite of datasets. Along the way, we give an optimal algorithm for differentially private singular vector computation which leads to significant savings in terms of space and time when operating on sparse matrices. It can also be used for private low-rank approximation
Differentially private model personalization
We study personalization of supervised learning with user-level differential privacy. Consider a setting with many users, each of whom has a training data set drawn from their own distribution Pi . Assuming some shared structure among the problems Pi, can users collectively learn the shared structure---and solve their tasks better than they could individually---while preserving the privacy of their data? We formulate this question using joint, user-level differential privacy---that is, we control what is leaked about each user's entire data set. We provide algorithms that exploit popular non-private approaches in this domain like the Almost-No-Inner-Loop (ANIL) method, and give strong user-level privacy guarantees for our general approach. When the problems Pi are linear regression problems with each user's regression vector lying in a common, unknown low-dimensional subspace, we show that our efficient algorithms satisfy nearly optimal estimation error guarantees. We also establish a general, information-theoretic upper bound via an exponential mechanism-based algorithm.https://proceedings.neurips.cc/paper/2021/hash/f8580959e35cb0934479bb007fb241c2-Abstract.htm
A Unifying Framework for Differentially Private Sums under Continual Observation
We study the problem of maintaining a differentially private decaying sum
under continual observation. We give a unifying framework and an efficient
algorithm for this problem for \emph{any sufficiently smooth} function. Our
algorithm is the first differentially private algorithm that does not have a
multiplicative error for polynomially-decaying weights. Our algorithm improves
on all prior works on differentially private decaying sums under continual
observation and recovers exactly the additive error for the special case of
continual counting from Henzinger et al. (SODA 2023) as a corollary.
Our algorithm is a variant of the factorization mechanism whose error depends
on the and norm of the underlying matrix. We give a
constructive proof for an almost exact upper bound on the and
norm and an almost tight lower bound on the norm for a
large class of lower-triangular matrices. This is the first non-trivial lower
bound for lower-triangular matrices whose non-zero entries are not all the
same. It includes matrices for all continual decaying sums problems, resulting
in an upper bound on the additive error of any differentially private decaying
sums algorithm under continual observation.
We also explore some implications of our result in discrepancy theory and
operator algebra. Given the importance of the norm in computer
science and the extensive work in mathematics, we believe our result will have
further applications.Comment: 32 page
Private Learning with Public Features
We study a class of private learning problems in which the data is a join of
private and public features. This is often the case in private personalization
tasks such as recommendation or ad prediction, in which features related to
individuals are sensitive, while features related to items (the movies or songs
to be recommended, or the ads to be shown to users) are publicly available and
do not require protection. A natural question is whether private algorithms can
achieve higher utility in the presence of public features. We give a positive
answer for multi-encoder models where one of the encoders operates on public
features. We develop new algorithms that take advantage of this separation by
only protecting certain sufficient statistics (instead of adding noise to the
gradient). This method has a guaranteed utility improvement for linear
regression, and importantly, achieves the state of the art on two standard
private recommendation benchmarks, demonstrating the importance of methods that
adapt to the private-public feature separation
Privacy-preserving Non-negative Matrix Factorization with Outliers
Non-negative matrix factorization is a popular unsupervised machine learning
algorithm for extracting meaningful features from data which are inherently
non-negative. However, such data sets may often contain privacy-sensitive user
data, and therefore, we may need to take necessary steps to ensure the privacy
of the users while analyzing the data. In this work, we focus on developing a
Non-negative matrix factorization algorithm in the privacy-preserving
framework. More specifically, we propose a novel privacy-preserving algorithm
for non-negative matrix factorisation capable of operating on private data,
while achieving results comparable to those of the non-private algorithm. We
design the framework such that one has the control to select the degree of
privacy grantee based on the utility gap. We show our proposed framework's
performance in six real data sets. The experimental results show that our
proposed method can achieve very close performance with the non-private
algorithm under some parameter regime, while ensuring strict privacy.Comment: 15 pages, 11 figures; additional explanations (in blue colours
- …