5,232 research outputs found
Boosting the Accuracy of Differentially-Private Histograms Through Consistency
We show that it is possible to significantly improve the accuracy of a
general class of histogram queries while satisfying differential privacy. Our
approach carefully chooses a set of queries to evaluate, and then exploits
consistency constraints that should hold over the noisy output. In a
post-processing phase, we compute the consistent input most likely to have
produced the noisy output. The final output is differentially-private and
consistent, but in addition, it is often much more accurate. We show, both
theoretically and experimentally, that these techniques can be used for
estimating the degree sequence of a graph very precisely, and for computing a
histogram that can support arbitrary range queries accurately.Comment: 15 pages, 7 figures, minor revisions to previous versio
Continuous Release of Data Streams under both Centralized and Local Differential Privacy
In this paper, we study the problem of publishing a stream of real-valued
data satisfying differential privacy (DP). One major challenge is that the
maximal possible value can be quite large; thus it is necessary to estimate a
threshold so that numbers above it are truncated to reduce the amount of noise
that is required to all the data. The estimation must be done based on the data
in a private fashion. We develop such a method that uses the Exponential
Mechanism with a quality function that approximates well the utility goal while
maintaining a low sensitivity. Given the threshold, we then propose a novel
online hierarchical method and several post-processing techniques.
Building on these ideas, we formalize the steps into a framework for private
publishing of stream data. Our framework consists of three components: a
threshold optimizer that privately estimates the threshold, a perturber that
adds calibrated noises to the stream, and a smoother that improves the result
using post-processing. Within our framework, we design an algorithm satisfying
the more stringent setting of DP called local DP (LDP). To our knowledge, this
is the first LDP algorithm for publishing streaming data. Using four real-world
datasets, we demonstrate that our mechanism outperforms the state-of-the-art by
a factor of 6-10 orders of magnitude in terms of utility (measured by the mean
squared error of answering a random range query)
On the Differential Privacy of Bayesian Inference
We study how to communicate findings of Bayesian inference to third parties,
while preserving the strong guarantee of differential privacy. Our main
contributions are four different algorithms for private Bayesian inference on
proba-bilistic graphical models. These include two mechanisms for adding noise
to the Bayesian updates, either directly to the posterior parameters, or to
their Fourier transform so as to preserve update consistency. We also utilise a
recently introduced posterior sampling mechanism, for which we prove bounds for
the specific but general case of discrete Bayesian networks; and we introduce a
maximum-a-posteriori private mechanism. Our analysis includes utility and
privacy bounds, with a novel focus on the influence of graph structure on
privacy. Worked examples and experiments with Bayesian na{\"i}ve Bayes and
Bayesian linear regression illustrate the application of our mechanisms.Comment: AAAI 2016, Feb 2016, Phoenix, Arizona, United State
Fine-grained Poisoning Attack to Local Differential Privacy Protocols for Mean and Variance Estimation
Although local differential privacy (LDP) protects individual users' data
from inference by an untrusted data curator, recent studies show that an
attacker can launch a data poisoning attack from the user side to inject
carefully-crafted bogus data into the LDP protocols in order to maximally skew
the final estimate by the data curator.
In this work, we further advance this knowledge by proposing a new
fine-grained attack, which allows the attacker to fine-tune and simultaneously
manipulate mean and variance estimations that are popular analytical tasks for
many real-world applications. To accomplish this goal, the attack leverages the
characteristics of LDP to inject fake data into the output domain of the local
LDP instance. We call our attack the output poisoning attack (OPA). We observe
a security-privacy consistency where a small privacy loss enhances the security
of LDP, which contradicts the known security-privacy trade-off from prior work.
We further study the consistency and reveal a more holistic view of the threat
landscape of data poisoning attacks on LDP. We comprehensively evaluate our
attack against a baseline attack that intuitively provides false input to LDP.
The experimental results show that OPA outperforms the baseline on three
real-world datasets. We also propose a novel defense method that can recover
the result accuracy from polluted data collection and offer insight into the
secure LDP design
- …