401 research outputs found
Boosting the Accuracy of Differentially-Private Histograms Through Consistency
We show that it is possible to significantly improve the accuracy of a
general class of histogram queries while satisfying differential privacy. Our
approach carefully chooses a set of queries to evaluate, and then exploits
consistency constraints that should hold over the noisy output. In a
post-processing phase, we compute the consistent input most likely to have
produced the noisy output. The final output is differentially-private and
consistent, but in addition, it is often much more accurate. We show, both
theoretically and experimentally, that these techniques can be used for
estimating the degree sequence of a graph very precisely, and for computing a
histogram that can support arbitrary range queries accurately.Comment: 15 pages, 7 figures, minor revisions to previous versio
Counting Distinct Elements in the Turnstile Model with Differential Privacy under Continual Observation
Privacy is a central challenge for systems that learn from sensitive data
sets, especially when a system's outputs must be continuously updated to
reflect changing data. We consider the achievable error for differentially
private continual release of a basic statistic -- the number of distinct items
-- in a stream where items may be both inserted and deleted (the turnstile
model). With only insertions, existing algorithms have additive error just
polylogarithmic in the length of the stream . We uncover a much richer
landscape in the turnstile model, even without considering memory restrictions.
We show that every differentially private mechanism that handles insertions and
deletions has worst-case additive error at least even under a
relatively weak, event-level privacy definition. Then, we identify a parameter
of the input stream, its maximum flippancy, that is low for natural data
streams and for which we give tight parameterized error guarantees.
Specifically, the maximum flippancy is the largest number of times that the
contribution of a single item to the distinct elements count changes over the
course of the stream. We present an item-level differentially private mechanism
that, for all turnstile streams with maximum flippancy , continually outputs
the number of distinct elements with an additive
error, without requiring prior knowledge of . We prove that this is the best
achievable error bound that depends only on , for a large range of values of
. When is small, the error of our mechanism is similar to the
polylogarithmic in error in the insertion-only setting, bypassing the
hardness in the turnstile model
Continuous Release of Data Streams under both Centralized and Local Differential Privacy
In this paper, we study the problem of publishing a stream of real-valued
data satisfying differential privacy (DP). One major challenge is that the
maximal possible value can be quite large; thus it is necessary to estimate a
threshold so that numbers above it are truncated to reduce the amount of noise
that is required to all the data. The estimation must be done based on the data
in a private fashion. We develop such a method that uses the Exponential
Mechanism with a quality function that approximates well the utility goal while
maintaining a low sensitivity. Given the threshold, we then propose a novel
online hierarchical method and several post-processing techniques.
Building on these ideas, we formalize the steps into a framework for private
publishing of stream data. Our framework consists of three components: a
threshold optimizer that privately estimates the threshold, a perturber that
adds calibrated noises to the stream, and a smoother that improves the result
using post-processing. Within our framework, we design an algorithm satisfying
the more stringent setting of DP called local DP (LDP). To our knowledge, this
is the first LDP algorithm for publishing streaming data. Using four real-world
datasets, we demonstrate that our mechanism outperforms the state-of-the-art by
a factor of 6-10 orders of magnitude in terms of utility (measured by the mean
squared error of answering a random range query)
- …