919 research outputs found
Differential Privacy and the Fat-Shattering Dimension of Linear Queries
In this paper, we consider the task of answering linear queries under the
constraint of differential privacy. This is a general and well-studied class of
queries that captures other commonly studied classes, including predicate
queries and histogram queries. We show that the accuracy to which a set of
linear queries can be answered is closely related to its fat-shattering
dimension, a property that characterizes the learnability of real-valued
functions in the agnostic-learning setting.Comment: Appears in APPROX 201
On the relation between Differential Privacy and Quantitative Information Flow
Differential privacy is a notion that has emerged in the community of
statistical databases, as a response to the problem of protecting the privacy
of the database's participants when performing statistical queries. The idea is
that a randomized query satisfies differential privacy if the likelihood of
obtaining a certain answer for a database is not too different from the
likelihood of obtaining the same answer on adjacent databases, i.e. databases
which differ from for only one individual. Information flow is an area of
Security concerned with the problem of controlling the leakage of confidential
information in programs and protocols. Nowadays, one of the most established
approaches to quantify and to reason about leakage is based on the R\'enyi min
entropy version of information theory. In this paper, we analyze critically the
notion of differential privacy in light of the conceptual framework provided by
the R\'enyi min information theory. We show that there is a close relation
between differential privacy and leakage, due to the graph symmetries induced
by the adjacency relation. Furthermore, we consider the utility of the
randomized answer, which measures its expected degree of accuracy. We focus on
certain kinds of utility functions called "binary", which have a close
correspondence with the R\'enyi min mutual information. Again, it turns out
that there can be a tight correspondence between differential privacy and
utility, depending on the symmetries induced by the adjacency relation and by
the query. Depending on these symmetries we can also build an optimal-utility
randomization mechanism while preserving the required level of differential
privacy. Our main contribution is a study of the kind of structures that can be
induced by the adjacency relation and the query, and how to use them to derive
bounds on the leakage and achieve the optimal utility
Distributed Private Heavy Hitters
In this paper, we give efficient algorithms and lower bounds for solving the
heavy hitters problem while preserving differential privacy in the fully
distributed local model. In this model, there are n parties, each of which
possesses a single element from a universe of size N. The heavy hitters problem
is to find the identity of the most common element shared amongst the n
parties. In the local model, there is no trusted database administrator, and so
the algorithm must interact with each of the parties separately, using a
differentially private protocol. We give tight information-theoretic upper and
lower bounds on the accuracy to which this problem can be solved in the local
model (giving a separation between the local model and the more common
centralized model of privacy), as well as computationally efficient algorithms
even in the case where the data universe N may be exponentially large
Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds
"Concentrated differential privacy" was recently introduced by Dwork and
Rothblum as a relaxation of differential privacy, which permits sharper
analyses of many privacy-preserving computations. We present an alternative
formulation of the concept of concentrated differential privacy in terms of the
Renyi divergence between the distributions obtained by running an algorithm on
neighboring inputs. With this reformulation in hand, we prove sharper
quantitative results, establish lower bounds, and raise a few new questions. We
also unify this approach with approximate differential privacy by giving an
appropriate definition of "approximate concentrated differential privacy.
An Improved Private Mechanism for Small Databases
We study the problem of answering a workload of linear queries ,
on a database of size at most drawn from a universe
under the constraint of (approximate) differential privacy.
Nikolov, Talwar, and Zhang~\cite{NTZ} proposed an efficient mechanism that, for
any given and , answers the queries with average error that is
at most a factor polynomial in and
worse than the best possible. Here we improve on this guarantee and give a
mechanism whose competitiveness ratio is at most polynomial in and
, and has no dependence on . Our mechanism
is based on the projection mechanism of Nikolov, Talwar, and Zhang, but in
place of an ad-hoc noise distribution, we use a distribution which is in a
sense optimal for the projection mechanism, and analyze it using convex duality
and the restricted invertibility principle.Comment: To appear in ICALP 2015, Track
An Automated Social Graph De-anonymization Technique
We present a generic and automated approach to re-identifying nodes in
anonymized social networks which enables novel anonymization techniques to be
quickly evaluated. It uses machine learning (decision forests) to matching
pairs of nodes in disparate anonymized sub-graphs. The technique uncovers
artefacts and invariants of any black-box anonymization scheme from a small set
of examples. Despite a high degree of automation, classification succeeds with
significant true positive rates even when small false positive rates are
sought. Our evaluation uses publicly available real world datasets to study the
performance of our approach against real-world anonymization strategies, namely
the schemes used to protect datasets of The Data for Development (D4D)
Challenge. We show that the technique is effective even when only small numbers
of samples are used for training. Further, since it detects weaknesses in the
black-box anonymization scheme it can re-identify nodes in one social network
when trained on another.Comment: 12 page
Heavy Hitters and the Structure of Local Privacy
We present a new locally differentially private algorithm for the heavy
hitters problem which achieves optimal worst-case error as a function of all
standardly considered parameters. Prior work obtained error rates which depend
optimally on the number of users, the size of the domain, and the privacy
parameter, but depend sub-optimally on the failure probability.
We strengthen existing lower bounds on the error to incorporate the failure
probability, and show that our new upper bound is tight with respect to this
parameter as well. Our lower bound is based on a new understanding of the
structure of locally private protocols. We further develop these ideas to
obtain the following general results beyond heavy hitters.
Advanced Grouposition: In the local model, group privacy for
users degrades proportionally to , instead of linearly in
as in the central model. Stronger group privacy yields improved max-information
guarantees, as well as stronger lower bounds (via "packing arguments"), over
the central model.
Building on a transformation of Bassily and Smith (STOC 2015), we
give a generic transformation from any non-interactive approximate-private
local protocol into a pure-private local protocol. Again in contrast with the
central model, this shows that we cannot obtain more accurate algorithms by
moving from pure to approximate local privacy
Differentially Private Billing with Rebates
A number of established and novel business models are based on fine grained billing, including pay-per-view, mobile messaging, voice calls, pay-as-you-drive insurance, smart metering for utility provision, private computing clouds and hosted services. These models apply fine-grained tariffs dependent on time-of-use or place of-use to readings to compute a bill. We extend previously proposed billing protocols to strengthen their privacy in two key ways. First, we study the monetary amount a customer should add to their bill in order to provably hide their activities, within the differential privacy framework. Second, we propose a cryptographic protocol for oblivious billing that ensures any additional expenditure, aimed at protecting privacy, can be tracked and reclaimed in the future, thus minimising its cost. Our proposals can be used together or separately and are backed by provable guarantees of security. © 2011 Springer-Verlag
Testing probability distributions underlying aggregated data
In this paper, we analyze and study a hybrid model for testing and learning
probability distributions. Here, in addition to samples, the testing algorithm
is provided with one of two different types of oracles to the unknown
distribution over . More precisely, we define both the dual and
cumulative dual access models, in which the algorithm can both sample from
and respectively, for any ,
- query the probability mass (query access); or
- get the total mass of , i.e. (cumulative
access)
These two models, by generalizing the previously studied sampling and query
oracle models, allow us to bypass the strong lower bounds established for a
number of problems in these settings, while capturing several interesting
aspects of these problems -- and providing new insight on the limitations of
the models. Finally, we show that while the testing algorithms can be in most
cases strictly more efficient, some tasks remain hard even with this additional
power
Differentially Private Exponential Random Graphs
We propose methods to release and analyze synthetic graphs in order to
protect privacy of individual relationships captured by the social network.
Proposed techniques aim at fitting and estimating a wide class of exponential
random graph models (ERGMs) in a differentially private manner, and thus offer
rigorous privacy guarantees. More specifically, we use the randomized response
mechanism to release networks under -edge differential privacy. To
maintain utility for statistical inference, treating the original graph as
missing, we propose a way to use likelihood based inference and Markov chain
Monte Carlo (MCMC) techniques to fit ERGMs to the produced synthetic networks.
We demonstrate the usefulness of the proposed techniques on a real data
example.Comment: minor edit
- …
