Search CORE

1,017 research outputs found

Wishart Mechanism for Differentially Private Principal Components Analysis

Author: Jiang Wuxuan
Xie Cong
Zhang Zhihua
Publication venue
Publication date: 19/11/2015
Field of study

We propose a new input perturbation mechanism for publishing a covariance matrix to achieve

(\epsilon,0)

-differential privacy. Our mechanism uses a Wishart distribution to generate matrix noise. In particular, We apply this mechanism to principal component analysis. Our mechanism is able to keep the positive semi-definiteness of the published covariance matrix. Thus, our approach gives rise to a general publishing framework for input perturbation of a symmetric positive semidefinite matrix. Moreover, compared with the classic Laplace mechanism, our method has better utility guarantee. To the best of our knowledge, Wishart mechanism is the best input perturbation approach for

(\epsilon,0)

-differentially private PCA. We also compare our work with previous exponential mechanism algorithms in the literature and provide near optimal bound while having more flexibility and less computational intractability.Comment: A full version with technical proofs. Accepted to AAAI-1

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

A Survey on Differential Privacy with Machine Learning and Future Outlook

Author: Baraheem Samah
Yao Zhongmei
Publication venue
Publication date: 19/11/2022
Field of study

Nowadays, machine learning models and applications have become increasingly pervasive. With this rapid increase in the development and employment of machine learning models, a concern regarding privacy has risen. Thus, there is a legitimate need to protect the data from leaking and from any attacks. One of the strongest and most prevalent privacy models that can be used to protect machine learning models from any attacks and vulnerabilities is differential privacy (DP). DP is strict and rigid definition of privacy, where it can guarantee that an adversary is not capable to reliably predict if a specific participant is included in the dataset or not. It works by injecting a noise to the data whether to the inputs, the outputs, the ground truth labels, the objective functions, or even to the gradients to alleviate the privacy issue and protect the data. To this end, this survey paper presents different differentially private machine learning algorithms categorized into two main categories (traditional machine learning models vs. deep learning models). Moreover, future research directions for differential privacy with machine learning algorithms are outlined.Comment: 12 pages, 3 figure

arXiv.org e-Print Archive

Differentially private low-dimensional representation of high-dimensional data

Author: He Yiyun
Strohmer Thomas
Vershynin Roman
Zhu Yizhe
Publication venue
Publication date: 25/05/2023
Field of study

Differentially private synthetic data provide a powerful mechanism to enable data analysis while protecting sensitive information about individuals. However, when the data lie in a high-dimensional space, the accuracy of the synthetic data suffers from the curse of dimensionality. In this paper, we propose a differentially private algorithm to generate low-dimensional synthetic data efficiently from a high-dimensional dataset with a utility guarantee with respect to the Wasserstein distance. A key step of our algorithm is a private principal component analysis (PCA) procedure with a near-optimal accuracy bound that circumvents the curse of dimensionality. Different from the standard perturbation analysis using the Davis-Kahan theorem, our analysis of private PCA works without assuming the spectral gap for the sample covariance matrix.Comment: 21 page

arXiv.org e-Print Archive

Less is More: Revisiting Gaussian Mechanism for Differential Privacy

Author: Ji Tianxi
Li Pan
Publication venue
Publication date: 04/06/2023
Field of study

In this paper, we identify that the classic Gaussian mechanism and its variants for differential privacy all suffer from \textbf{the curse of full-rank covariance matrices}, and hence the expected accuracy losses of these mechanisms applied to high dimensional query results, e.g., in

\mathbb{R}^M

, all increase linearly with

M

. To lift this curse, we design a Rank-1 Singular Multivariate Gaussian Mechanism (R1SMG). It achieves

(\epsilon,\delta)

-DP on query results in

\mathbb{R}^M

by perturbing the results with noise following a singular multivariate Gaussian distribution, whose covariance matrix is a \textbf{randomly} generated rank-1 positive semi-definite matrix. In contrast, the classic Gaussian mechanism and its variants all consider \textbf{deterministic} full-rank covariance matrices. Our idea is motivated by a clue from Dwork et al.'s work on Gaussian mechanism that has been ignored in the literature: when projecting multivariate Gaussian noise with a full-rank covariance matrix onto a set of orthonormal basis in

\mathbb{R}^M

, only the coefficient of a single basis can contribute to the privacy guarantee. This paper makes the following technical contributions. (i) R1SMG achieves

(\epsilon,\delta)

-DP guarantee on query results in

\mathbb{R}^M

, while the magnitude of the additive noise decreases with

M

. Therefore, \textbf{less is more}, i.e., less amount of noise is able to sanitize higher dimensional query results. When

M\rightarrow \infty

, the expected accuracy loss converges to

{2(\Delta_2f)^2}/{\epsilon}

, where

\Delta_2f

is the

l_2

sensitivity of the query function

f

. (ii) Compared with other mechanisms, R1SMG is less likely to generate noise with large magnitude that overwhelms the query results, because the kurtosis and skewness of the nondeterministic accuracy loss introduced by R1SMG is larger than that introduced by other mechanisms

arXiv.org e-Print Archive