1,641 research outputs found
MVG Mechanism: Differential Privacy under Matrix-Valued Query
Differential privacy mechanism design has traditionally been tailored for a
scalar-valued query function. Although many mechanisms such as the Laplace and
Gaussian mechanisms can be extended to a matrix-valued query function by adding
i.i.d. noise to each element of the matrix, this method is often suboptimal as
it forfeits an opportunity to exploit the structural characteristics typically
associated with matrix analysis. To address this challenge, we propose a novel
differential privacy mechanism called the Matrix-Variate Gaussian (MVG)
mechanism, which adds a matrix-valued noise drawn from a matrix-variate
Gaussian distribution, and we rigorously prove that the MVG mechanism preserves
-differential privacy. Furthermore, we introduce the concept
of directional noise made possible by the design of the MVG mechanism.
Directional noise allows the impact of the noise on the utility of the
matrix-valued query function to be moderated. Finally, we experimentally
demonstrate the performance of our mechanism using three matrix-valued queries
on three privacy-sensitive datasets. We find that the MVG mechanism notably
outperforms four previous state-of-the-art approaches, and provides comparable
utility to the non-private baseline.Comment: Appeared in CCS'1
The Geometry of Differential Privacy: the Sparse and Approximate Cases
In this work, we study trade-offs between accuracy and privacy in the context
of linear queries over histograms. This is a rich class of queries that
includes contingency tables and range queries, and has been a focus of a long
line of work. For a set of linear queries over a database , we
seek to find the differentially private mechanism that has the minimum mean
squared error. For pure differential privacy, an approximation to
the optimal mechanism is known. Our first contribution is to give an approximation guarantee for the case of (\eps,\delta)-differential
privacy. Our mechanism is simple, efficient and adds correlated Gaussian noise
to the answers. We prove its approximation guarantee relative to the hereditary
discrepancy lower bound of Muthukrishnan and Nikolov, using tools from convex
geometry.
We next consider this question in the case when the number of queries exceeds
the number of individuals in the database, i.e. when . It is known that better mechanisms exist in this setting. Our second
main contribution is to give an (\eps,\delta)-differentially private
mechanism which is optimal up to a \polylog(d,N) factor for any given query
set and any given upper bound on . This approximation is
achieved by coupling the Gaussian noise addition approach with a linear
regression step. We give an analogous result for the \eps-differential
privacy setting. We also improve on the mean squared error upper bound for
answering counting queries on a database of size by Blum, Ligett, and Roth,
and match the lower bound implied by the work of Dinur and Nissim up to
logarithmic factors.
The connection between hereditary discrepancy and the privacy mechanism
enables us to derive the first polylogarithmic approximation to the hereditary
discrepancy of a matrix
Minimax Optimality In High-Dimensional Classification, Clustering, And Privacy
The age of “Big Data” features large volume of massive and high-dimensional datasets, leading to fast emergence of different algorithms, as well as new concerns such as privacy and fairness. To compare different algorithms with (without) these new constraints, minimax decision theory provides a principled framework to quantify the optimality of algorithms and investigate the fundamental difficulty of statistical problems. Under the framework of minimax theory, this thesis aims to address the following four problems:
1. The first part of this thesis aims to develop an optimality theory for linear discriminant analysis in the high-dimensional setting. In addition, we consider classification with incomplete data under the missing completely at random (MCR) model.
2. In the second part, we study high-dimensional sparse Quadratic Discriminant Analysis (QDA) and aim to establish the optimal convergence rates.
3. In the third part, we study the optimality of high-dimensional clustering on the unsupervised setting under the Gaussian mixtures model. We propose a EM-based procedure with the optimal rate of convergence for the excess mis-clustering error.
4. In the fourth part, we investigate the minimax optimality under the privacy constraint for mean estimation and linear regression models, under both the classical low-dimensional and modern high-dimensional settings
Differentially Private Model Selection with Penalized and Constrained Likelihood
In statistical disclosure control, the goal of data analysis is twofold: The
released information must provide accurate and useful statistics about the
underlying population of interest, while minimizing the potential for an
individual record to be identified. In recent years, the notion of differential
privacy has received much attention in theoretical computer science, machine
learning, and statistics. It provides a rigorous and strong notion of
protection for individuals' sensitive information. A fundamental question is
how to incorporate differential privacy into traditional statistical inference
procedures. In this paper we study model selection in multivariate linear
regression under the constraint of differential privacy. We show that model
selection procedures based on penalized least squares or likelihood can be made
differentially private by a combination of regularization and randomization,
and propose two algorithms to do so. We show that our private procedures are
consistent under essentially the same conditions as the corresponding
non-private procedures. We also find that under differential privacy, the
procedure becomes more sensitive to the tuning parameters. We illustrate and
evaluate our method using simulation studies and two real data examples
A Knowledge Transfer Framework for Differentially Private Sparse Learning
We study the problem of estimating high dimensional models with underlying
sparse structures while preserving the privacy of each training example. We
develop a differentially private high-dimensional sparse learning framework
using the idea of knowledge transfer. More specifically, we propose to distill
the knowledge from a "teacher" estimator trained on a private dataset, by
creating a new dataset from auxiliary features, and then train a differentially
private "student" estimator using this new dataset. In addition, we establish
the linear convergence rate as well as the utility guarantee for our proposed
method. For sparse linear regression and sparse logistic regression, our method
achieves improved utility guarantees compared with the best known results
(Kifer et al., 2012; Wang and Gu, 2019). We further demonstrate the superiority
of our framework through both synthetic and real-world data experiments.Comment: 24 pages, 2 figures, 3 table
- …