1,981 research outputs found
Recommended from our members
New Program Abstractions for Privacy
Static program analysis, once seen primarily as a tool for optimising programs, is now increasingly important as a means to provide quality guarantees about programs. One measure of quality is the extent to which programs respect the privacy of user data. Differential privacy is a rigorous quantified definition of privacy which guarantees a bound on the loss of privacy due to the release of statistical queries. Among the benefits enjoyed by the definition of differential privacy are compositionality properties that allow differentially private analyses to be built from pieces and combined in various ways. This has led to the development of frameworks for the construction of differentially private program analyses which are private-by-construction. Past frameworks assume that the sensitive data is collected centrally, and processed by a trusted curator. However, the main examples of differential privacy applied in practice - for example in the use of differential privacy in Google Chrome’s collection of browsing statistics, or Apple’s training of predictive messaging in iOS 10 -use a purely local mechanism applied at the data source, thus avoiding the collection of sensitive data altogether. While this is a benefit of the local approach, with systems like Apple’s, users are required to completely trust that the analysis running on their system has the claimed privacy properties.
In this position paper we outline some key challenges in developing static analyses for analysing differential privacy, and propose novel abstractions for describing the behaviour of probabilistic programs not previously used in static analyses
Differential Privacy for Relational Algebra: Improving the Sensitivity Bounds via Constraint Systems
Differential privacy is a modern approach in privacy-preserving data analysis
to control the amount of information that can be inferred about an individual
by querying a database. The most common techniques are based on the
introduction of probabilistic noise, often defined as a Laplacian parametric on
the sensitivity of the query. In order to maximize the utility of the query, it
is crucial to estimate the sensitivity as precisely as possible.
In this paper we consider relational algebra, the classical language for
queries in relational databases, and we propose a method for computing a bound
on the sensitivity of queries in an intuitive and compositional way. We use
constraint-based techniques to accumulate the information on the possible
values for attributes provided by the various components of the query, thus
making it possible to compute tight bounds on the sensitivity.Comment: In Proceedings QAPL 2012, arXiv:1207.055
Differential Privacy and the Fat-Shattering Dimension of Linear Queries
In this paper, we consider the task of answering linear queries under the
constraint of differential privacy. This is a general and well-studied class of
queries that captures other commonly studied classes, including predicate
queries and histogram queries. We show that the accuracy to which a set of
linear queries can be answered is closely related to its fat-shattering
dimension, a property that characterizes the learnability of real-valued
functions in the agnostic-learning setting.Comment: Appears in APPROX 201
Lower bounds in differential privacy
This is a paper about private data analysis, in which a trusted curator
holding a confidential database responds to real vector-valued queries. A
common approach to ensuring privacy for the database elements is to add
appropriately generated random noise to the answers, releasing only these {\em
noisy} responses. In this paper, we investigate various lower bounds on the
noise required to maintain different kind of privacy guarantees.Comment: Corrected some minor errors and typos. To appear in Theory of
Cryptography Conference (TCC) 201
Distributed Private Heavy Hitters
In this paper, we give efficient algorithms and lower bounds for solving the
heavy hitters problem while preserving differential privacy in the fully
distributed local model. In this model, there are n parties, each of which
possesses a single element from a universe of size N. The heavy hitters problem
is to find the identity of the most common element shared amongst the n
parties. In the local model, there is no trusted database administrator, and so
the algorithm must interact with each of the parties separately, using a
differentially private protocol. We give tight information-theoretic upper and
lower bounds on the accuracy to which this problem can be solved in the local
model (giving a separation between the local model and the more common
centralized model of privacy), as well as computationally efficient algorithms
even in the case where the data universe N may be exponentially large
On the relation between Differential Privacy and Quantitative Information Flow
Differential privacy is a notion that has emerged in the community of
statistical databases, as a response to the problem of protecting the privacy
of the database's participants when performing statistical queries. The idea is
that a randomized query satisfies differential privacy if the likelihood of
obtaining a certain answer for a database is not too different from the
likelihood of obtaining the same answer on adjacent databases, i.e. databases
which differ from for only one individual. Information flow is an area of
Security concerned with the problem of controlling the leakage of confidential
information in programs and protocols. Nowadays, one of the most established
approaches to quantify and to reason about leakage is based on the R\'enyi min
entropy version of information theory. In this paper, we analyze critically the
notion of differential privacy in light of the conceptual framework provided by
the R\'enyi min information theory. We show that there is a close relation
between differential privacy and leakage, due to the graph symmetries induced
by the adjacency relation. Furthermore, we consider the utility of the
randomized answer, which measures its expected degree of accuracy. We focus on
certain kinds of utility functions called "binary", which have a close
correspondence with the R\'enyi min mutual information. Again, it turns out
that there can be a tight correspondence between differential privacy and
utility, depending on the symmetries induced by the adjacency relation and by
the query. Depending on these symmetries we can also build an optimal-utility
randomization mechanism while preserving the required level of differential
privacy. Our main contribution is a study of the kind of structures that can be
induced by the adjacency relation and the query, and how to use them to derive
bounds on the leakage and achieve the optimal utility
Individual Fairness in Pipelines
It is well understood that a system built from individually fair components
may not itself be individually fair. In this work, we investigate individual
fairness under pipeline composition. Pipelines differ from ordinary sequential
or repeated composition in that individuals may drop out at any stage, and
classification in subsequent stages may depend on the remaining "cohort" of
individuals. As an example, a company might hire a team for a new project and
at a later point promote the highest performer on the team. Unlike other
repeated classification settings, where the degree of unfairness degrades
gracefully over multiple fair steps, the degree of unfairness in pipelines can
be arbitrary, even in a pipeline with just two stages.
Guided by a panoply of real-world examples, we provide a rigorous framework
for evaluating different types of fairness guarantees for pipelines. We show
that na\"{i}ve auditing is unable to uncover systematic unfairness and that, in
order to ensure fairness, some form of dependence must exist between the design
of algorithms at different stages in the pipeline. Finally, we provide
constructions that permit flexibility at later stages, meaning that there is no
need to lock in the entire pipeline at the time that the early stage is
constructed
An Improved Private Mechanism for Small Databases
We study the problem of answering a workload of linear queries ,
on a database of size at most drawn from a universe
under the constraint of (approximate) differential privacy.
Nikolov, Talwar, and Zhang~\cite{NTZ} proposed an efficient mechanism that, for
any given and , answers the queries with average error that is
at most a factor polynomial in and
worse than the best possible. Here we improve on this guarantee and give a
mechanism whose competitiveness ratio is at most polynomial in and
, and has no dependence on . Our mechanism
is based on the projection mechanism of Nikolov, Talwar, and Zhang, but in
place of an ad-hoc noise distribution, we use a distribution which is in a
sense optimal for the projection mechanism, and analyze it using convex duality
and the restricted invertibility principle.Comment: To appear in ICALP 2015, Track
Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds
"Concentrated differential privacy" was recently introduced by Dwork and
Rothblum as a relaxation of differential privacy, which permits sharper
analyses of many privacy-preserving computations. We present an alternative
formulation of the concept of concentrated differential privacy in terms of the
Renyi divergence between the distributions obtained by running an algorithm on
neighboring inputs. With this reformulation in hand, we prove sharper
quantitative results, establish lower bounds, and raise a few new questions. We
also unify this approach with approximate differential privacy by giving an
appropriate definition of "approximate concentrated differential privacy.
- …