1,652 research outputs found
Buying Private Data without Verification
We consider the problem of designing a survey to aggregate non-verifiable
information from a privacy-sensitive population: an analyst wants to compute
some aggregate statistic from the private bits held by each member of a
population, but cannot verify the correctness of the bits reported by
participants in his survey. Individuals in the population are strategic agents
with a cost for privacy, \ie, they not only account for the payments they
expect to receive from the mechanism, but also their privacy costs from any
information revealed about them by the mechanism's outcome---the computed
statistic as well as the payments---to determine their utilities. How can the
analyst design payments to obtain an accurate estimate of the population
statistic when individuals strategically decide both whether to participate and
whether to truthfully report their sensitive information?
We design a differentially private peer-prediction mechanism that supports
accurate estimation of the population statistic as a Bayes-Nash equilibrium in
settings where agents have explicit preferences for privacy. The mechanism
requires knowledge of the marginal prior distribution on bits , but does
not need full knowledge of the marginal distribution on the costs ,
instead requiring only an approximate upper bound. Our mechanism guarantees
-differential privacy to each agent against any adversary who can
observe the statistical estimate output by the mechanism, as well as the
payments made to the other agents . Finally, we show that with
slightly more structured assumptions on the privacy cost functions of each
agent, the cost of running the survey goes to as the number of agents
diverges.Comment: Appears in EC 201
Detecting Communities under Differential Privacy
Complex networks usually expose community structure with groups of nodes
sharing many links with the other nodes in the same group and relatively few
with the nodes of the rest. This feature captures valuable information about
the organization and even the evolution of the network. Over the last decade, a
great number of algorithms for community detection have been proposed to deal
with the increasingly complex networks. However, the problem of doing this in a
private manner is rarely considered. In this paper, we solve this problem under
differential privacy, a prominent privacy concept for releasing private data.
We analyze the major challenges behind the problem and propose several schemes
to tackle them from two perspectives: input perturbation and algorithm
perturbation. We choose Louvain method as the back-end community detection for
input perturbation schemes and propose the method LouvainDP which runs Louvain
algorithm on a noisy super-graph. For algorithm perturbation, we design
ModDivisive using exponential mechanism with the modularity as the score. We
have thoroughly evaluated our techniques on real graphs of different sizes and
verified their outperformance over the state-of-the-art
Mining Frequent Graph Patterns with Differential Privacy
Discovering frequent graph patterns in a graph database offers valuable
information in a variety of applications. However, if the graph dataset
contains sensitive data of individuals such as mobile phone-call graphs and
web-click graphs, releasing discovered frequent patterns may present a threat
to the privacy of individuals. {\em Differential privacy} has recently emerged
as the {\em de facto} standard for private data analysis due to its provable
privacy guarantee. In this paper we propose the first differentially private
algorithm for mining frequent graph patterns.
We first show that previous techniques on differentially private discovery of
frequent {\em itemsets} cannot apply in mining frequent graph patterns due to
the inherent complexity of handling structural information in graphs. We then
address this challenge by proposing a Markov Chain Monte Carlo (MCMC) sampling
based algorithm. Unlike previous work on frequent itemset mining, our
techniques do not rely on the output of a non-private mining algorithm.
Instead, we observe that both frequent graph pattern mining and the guarantee
of differential privacy can be unified into an MCMC sampling framework. In
addition, we establish the privacy and utility guarantee of our algorithm and
propose an efficient neighboring pattern counting technique as well.
Experimental results show that the proposed algorithm is able to output
frequent patterns with good precision
Take it or Leave it: Running a Survey when Privacy Comes at a Cost
In this paper, we consider the problem of estimating a potentially sensitive (individually stigmatizing) statistic on a population. In our model, individuals are concerned about their privacy, and experience some cost as a function of their privacy loss. Nevertheless, they would be willing to participate in the survey if they were compensated for their privacy cost. These cost functions are not publicly known, however, nor do we make Bayesian assumptions about their form or distribution. Individuals are rational and will misreport their costs for privacy if doing so is in their best interest. Ghosh and Roth recently showed in this setting, when costs for privacy loss may be correlated with private types, if individuals value differential privacy, no individually rational direct revelation mechanism can compute any non-trivial estimate of the population statistic. In this paper, we circumvent this impossibility result by proposing a modified notion of how individuals experience cost as a function of their privacy loss, and by giving a mechanism which does not operate by direct revelation. Instead, our mechanism has the ability to randomly approach individuals from a population and offer them a take-it-or-leave-it offer. This is intended to model the abilities of a surveyor who may stand on a street corner and approach passers-by
SoK: Chasing Accuracy and Privacy, and Catching Both in Differentially Private Histogram Publication
Histograms and synthetic data are of key importance in data analysis. However, researchers have shown that even aggregated data such as histograms, containing no obvious sensitive attributes, can result in privacy leakage. To enable data analysis, a strong notion of privacy is required to avoid risking unintended privacy violations.Such a strong notion of privacy is differential privacy, a statistical notion of privacy that makes privacy leakage quantifiable. The caveat regarding differential privacy is that while it has strong guarantees for privacy, privacy comes at a cost of accuracy. Despite this trade-off being a central and important issue in the adoption of differential privacy, there exists a gap in the literature regarding providing an understanding of the trade-off and how to address it appropriately. Through a systematic literature review (SLR), we investigate the state-of-the-art within accuracy improving differentially private algorithms for histogram and synthetic data publishing. Our contribution is two-fold: 1) we identify trends and connections in the contributions to the field of differential privacy for histograms and synthetic data and 2) we provide an understanding of the privacy/accuracy trade-off challenge by crystallizing different dimensions to accuracy improvement. Accordingly, we position and visualize the ideas in relation to each other and external work, and deconstruct each algorithm to examine the building blocks separately with the aim of pinpointing which dimension of accuracy improvement each technique/approach is targeting. Hence, this systematization of knowledge (SoK) provides an understanding of in which dimensions and how accuracy improvement can be pursued without sacrificing privacy
A Theory of Pricing Private Data
Personal data has value to both its owner and to institutions who would like
to analyze it. Privacy mechanisms protect the owner's data while releasing to
analysts noisy versions of aggregate query results. But such strict protections
of individual's data have not yet found wide use in practice. Instead, Internet
companies, for example, commonly provide free services in return for valuable
sensitive information from users, which they exploit and sometimes sell to
third parties.
As the awareness of the value of the personal data increases, so has the
drive to compensate the end user for her private information. The idea of
monetizing private data can improve over the narrower view of hiding private
data, since it empowers individuals to control their data through financial
means.
In this paper we propose a theoretical framework for assigning prices to
noisy query answers, as a function of their accuracy, and for dividing the
price amongst data owners who deserve compensation for their loss of privacy.
Our framework adopts and extends key principles from both differential privacy
and query pricing in data markets. We identify essential properties of the
price function and micro-payments, and characterize valid solutions.Comment: 25 pages, 2 figures. Best Paper Award, to appear in the 16th
International Conference on Database Theory (ICDT), 201
- …