29 research outputs found
Conducting Truthful Surveys, Cheaply
We consider the problem of conducting a survey with the goal of obtaining an
unbiased estimator of some population statistic when individuals have unknown
costs (drawn from a known prior) for participating in the survey. Individuals
must be compensated for their participation and are strategic agents, and so
the payment scheme must incentivize truthful behavior. We derive optimal
truthful mechanisms for this problem for the two goals of minimizing the
variance of the estimator given a fixed budget, and minimizing the expected
cost of the survey given a fixed variance goal
Redrawing the Boundaries on Purchasing Data from Privacy-Sensitive Individuals
We prove new positive and negative results concerning the existence of
truthful and individually rational mechanisms for purchasing private data from
individuals with unbounded and sensitive privacy preferences. We strengthen the
impossibility results of Ghosh and Roth (EC 2011) by extending it to a much
wider class of privacy valuations. In particular, these include privacy
valuations that are based on ({\epsilon}, {\delta})-differentially private
mechanisms for non-zero {\delta}, ones where the privacy costs are measured in
a per-database manner (rather than taking the worst case), and ones that do not
depend on the payments made to players (which might not be observable to an
adversary). To bypass this impossibility result, we study a natural special
setting where individuals have mono- tonic privacy valuations, which captures
common contexts where certain values for private data are expected to lead to
higher valuations for privacy (e.g. having a particular disease). We give new
mech- anisms that are individually rational for all players with monotonic
privacy valuations, truthful for all players whose privacy valuations are not
too large, and accurate if there are not too many players with too-large
privacy valuations. We also prove matching lower bounds showing that in some
respects our mechanism cannot be improved significantly
Buying Private Data without Verification
We consider the problem of designing a survey to aggregate non-verifiable
information from a privacy-sensitive population: an analyst wants to compute
some aggregate statistic from the private bits held by each member of a
population, but cannot verify the correctness of the bits reported by
participants in his survey. Individuals in the population are strategic agents
with a cost for privacy, \ie, they not only account for the payments they
expect to receive from the mechanism, but also their privacy costs from any
information revealed about them by the mechanism's outcome---the computed
statistic as well as the payments---to determine their utilities. How can the
analyst design payments to obtain an accurate estimate of the population
statistic when individuals strategically decide both whether to participate and
whether to truthfully report their sensitive information?
We design a differentially private peer-prediction mechanism that supports
accurate estimation of the population statistic as a Bayes-Nash equilibrium in
settings where agents have explicit preferences for privacy. The mechanism
requires knowledge of the marginal prior distribution on bits , but does
not need full knowledge of the marginal distribution on the costs ,
instead requiring only an approximate upper bound. Our mechanism guarantees
-differential privacy to each agent against any adversary who can
observe the statistical estimate output by the mechanism, as well as the
payments made to the other agents . Finally, we show that with
slightly more structured assumptions on the privacy cost functions of each
agent, the cost of running the survey goes to as the number of agents
diverges.Comment: Appears in EC 201
A Theory of Pricing Private Data
Personal data has value to both its owner and to institutions who would like
to analyze it. Privacy mechanisms protect the owner's data while releasing to
analysts noisy versions of aggregate query results. But such strict protections
of individual's data have not yet found wide use in practice. Instead, Internet
companies, for example, commonly provide free services in return for valuable
sensitive information from users, which they exploit and sometimes sell to
third parties.
As the awareness of the value of the personal data increases, so has the
drive to compensate the end user for her private information. The idea of
monetizing private data can improve over the narrower view of hiding private
data, since it empowers individuals to control their data through financial
means.
In this paper we propose a theoretical framework for assigning prices to
noisy query answers, as a function of their accuracy, and for dividing the
price amongst data owners who deserve compensation for their loss of privacy.
Our framework adopts and extends key principles from both differential privacy
and query pricing in data markets. We identify essential properties of the
price function and micro-payments, and characterize valid solutions.Comment: 25 pages, 2 figures. Best Paper Award, to appear in the 16th
International Conference on Database Theory (ICDT), 201
Take it or Leave it: Running a Survey when Privacy Comes at a Cost
In this paper, we consider the problem of estimating a potentially sensitive (individually stigmatizing) statistic on a population. In our model, individuals are concerned about their privacy, and experience some cost as a function of their privacy loss. Nevertheless, they would be willing to participate in the survey if they were compensated for their privacy cost. These cost functions are not publicly known, however, nor do we make Bayesian assumptions about their form or distribution. Individuals are rational and will misreport their costs for privacy if doing so is in their best interest. Ghosh and Roth recently showed in this setting, when costs for privacy loss may be correlated with private types, if individuals value differential privacy, no individually rational direct revelation mechanism can compute any non-trivial estimate of the population statistic. In this paper, we circumvent this impossibility result by proposing a modified notion of how individuals experience cost as a function of their privacy loss, and by giving a mechanism which does not operate by direct revelation. Instead, our mechanism has the ability to randomly approach individuals from a population and offer them a take-it-or-leave-it offer. This is intended to model the abilities of a surveyor who may stand on a street corner and approach passers-by
Low-Cost Learning via Active Data Procurement
We design mechanisms for online procurement of data held by strategic agents
for machine learning tasks. The challenge is to use past data to actively price
future data and give learning guarantees even when an agent's cost for
revealing her data may depend arbitrarily on the data itself. We achieve this
goal by showing how to convert a large class of no-regret algorithms into
online posted-price and learning mechanisms. Our results in a sense parallel
classic sample complexity guarantees, but with the key resource being money
rather than quantity of data: With a budget constraint , we give robust risk
(predictive error) bounds on the order of . Because we use an
active approach, we can often guarantee to do significantly better by
leveraging correlations between costs and data.
Our algorithms and analysis go through a model of no-regret learning with
arriving pairs (cost, data) and a budget constraint of . Our regret bounds
for this model are on the order of and we give lower bounds on the
same order.Comment: Full version of EC 2015 paper. Color recommended for figures but
nonessential. 36 pages, of which 12 appendi
Optimal Data Acquisition for Statistical Estimation
We consider a data analyst's problem of purchasing data from strategic agents
to compute an unbiased estimate of a statistic of interest. Agents incur
private costs to reveal their data and the costs can be arbitrarily correlated
with their data. Once revealed, data are verifiable. This paper focuses on
linear unbiased estimators. We design an individually rational and incentive
compatible mechanism that optimizes the worst-case mean-squared error of the
estimation, where the worst-case is over the unknown correlation between costs
and data, subject to a budget constraint in expectation. We characterize the
form of the optimal mechanism in closed-form. We further extend our results to
acquiring data for estimating a parameter in regression analysis, where private
costs can correlate with the values of the dependent variable but not with the
values of the independent variables
A Game-Theoretic Study on Non-Monetary Incentives in Data Analytics Projects with Privacy Implications
The amount of personal information contributed by individuals to digital
repositories such as social network sites has grown substantially. The
existence of this data offers unprecedented opportunities for data analytics
research in various domains of societal importance including medicine and
public policy. The results of these analyses can be considered a public good
which benefits data contributors as well as individuals who are not making
their data available. At the same time, the release of personal information
carries perceived and actual privacy risks to the contributors. Our research
addresses this problem area. In our work, we study a game-theoretic model in
which individuals take control over participation in data analytics projects in
two ways: 1) individuals can contribute data at a self-chosen level of
precision, and 2) individuals can decide whether they want to contribute at all
(or not). From the analyst's perspective, we investigate to which degree the
research analyst has flexibility to set requirements for data precision, so
that individuals are still willing to contribute to the project, and the
quality of the estimation improves. We study this tradeoff scenario for
populations of homogeneous and heterogeneous individuals, and determine Nash
equilibria that reflect the optimal level of participation and precision of
contributions. We further prove that the analyst can substantially increase the
accuracy of the analysis by imposing a lower bound on the precision of the data
that users can reveal