1,017 research outputs found
Wishart Mechanism for Differentially Private Principal Components Analysis
We propose a new input perturbation mechanism for publishing a covariance
matrix to achieve -differential privacy. Our mechanism uses a
Wishart distribution to generate matrix noise. In particular, We apply this
mechanism to principal component analysis. Our mechanism is able to keep the
positive semi-definiteness of the published covariance matrix. Thus, our
approach gives rise to a general publishing framework for input perturbation of
a symmetric positive semidefinite matrix. Moreover, compared with the classic
Laplace mechanism, our method has better utility guarantee. To the best of our
knowledge, Wishart mechanism is the best input perturbation approach for
-differentially private PCA. We also compare our work with
previous exponential mechanism algorithms in the literature and provide near
optimal bound while having more flexibility and less computational
intractability.Comment: A full version with technical proofs. Accepted to AAAI-1
A Survey on Differential Privacy with Machine Learning and Future Outlook
Nowadays, machine learning models and applications have become increasingly
pervasive. With this rapid increase in the development and employment of
machine learning models, a concern regarding privacy has risen. Thus, there is
a legitimate need to protect the data from leaking and from any attacks. One of
the strongest and most prevalent privacy models that can be used to protect
machine learning models from any attacks and vulnerabilities is differential
privacy (DP). DP is strict and rigid definition of privacy, where it can
guarantee that an adversary is not capable to reliably predict if a specific
participant is included in the dataset or not. It works by injecting a noise to
the data whether to the inputs, the outputs, the ground truth labels, the
objective functions, or even to the gradients to alleviate the privacy issue
and protect the data. To this end, this survey paper presents different
differentially private machine learning algorithms categorized into two main
categories (traditional machine learning models vs. deep learning models).
Moreover, future research directions for differential privacy with machine
learning algorithms are outlined.Comment: 12 pages, 3 figure
Differentially private low-dimensional representation of high-dimensional data
Differentially private synthetic data provide a powerful mechanism to enable
data analysis while protecting sensitive information about individuals.
However, when the data lie in a high-dimensional space, the accuracy of the
synthetic data suffers from the curse of dimensionality. In this paper, we
propose a differentially private algorithm to generate low-dimensional
synthetic data efficiently from a high-dimensional dataset with a utility
guarantee with respect to the Wasserstein distance. A key step of our algorithm
is a private principal component analysis (PCA) procedure with a near-optimal
accuracy bound that circumvents the curse of dimensionality. Different from the
standard perturbation analysis using the Davis-Kahan theorem, our analysis of
private PCA works without assuming the spectral gap for the sample covariance
matrix.Comment: 21 page
Less is More: Revisiting Gaussian Mechanism for Differential Privacy
In this paper, we identify that the classic Gaussian mechanism and its
variants for differential privacy all suffer from \textbf{the curse of
full-rank covariance matrices}, and hence the expected accuracy losses of these
mechanisms applied to high dimensional query results, e.g., in ,
all increase linearly with .
To lift this curse, we design a Rank-1 Singular Multivariate Gaussian
Mechanism (R1SMG). It achieves -DP on query results in
by perturbing the results with noise following a singular
multivariate Gaussian distribution, whose covariance matrix is a
\textbf{randomly} generated rank-1 positive semi-definite matrix. In contrast,
the classic Gaussian mechanism and its variants all consider
\textbf{deterministic} full-rank covariance matrices. Our idea is motivated by
a clue from Dwork et al.'s work on Gaussian mechanism that has been ignored in
the literature: when projecting multivariate Gaussian noise with a full-rank
covariance matrix onto a set of orthonormal basis in , only the
coefficient of a single basis can contribute to the privacy guarantee.
This paper makes the following technical contributions.
(i) R1SMG achieves -DP guarantee on query results in
, while the magnitude of the additive noise decreases with .
Therefore, \textbf{less is more}, i.e., less amount of noise is able to
sanitize higher dimensional query results. When , the
expected accuracy loss converges to , where
is the sensitivity of the query function .
(ii) Compared with other mechanisms, R1SMG is less likely to generate noise
with large magnitude that overwhelms the query results, because the kurtosis
and skewness of the nondeterministic accuracy loss introduced by R1SMG is
larger than that introduced by other mechanisms
- …