130 research outputs found
The price of differential privacy under continual observation
We study the accuracy of differentially private mechanisms in the continual release model. A continual
release mechanism receives a sensitive dataset as a stream of T inputs and produces, after receiving each input, an accurate output on the obtained inputs. In contrast, a batch algorithm receives the data as
one batch and produces a single output.
We provide the first strong lower bounds on the error of continual release mechanisms. In particular,
for two fundamental problems that are widely studied and used in the batch model, we show that
the worst case error of every continual release algorithm is ~Ω (T^1/3) times larger than that of the best
batch algorithm. Previous work shows only a polylogarithimic (in T) gap between the worst case error
achievable in these two models; further, for many problems, including the summation of binary attributes,
the polylogarithmic gap is tight (Dwork et al., 2010; Chan et al., 2010). Our results show that problems
closely related to summation-specifically, those that require selecting the largest of a set of sums|are
fundamentally harder in the continual release model than in the batch model.
Our lower bounds assume only that privacy holds for streams fixed in advance (the "nonadaptive"
setting). However, we provide matching upper bounds that hold in a model where privacy is required
even for adaptively selected streams. This model may be of independent interest.https://arxiv.org/abs/2112.0082
Counting Distinct Elements in the Turnstile Model with Differential Privacy under Continual Observation
Privacy is a central challenge for systems that learn from sensitive data
sets, especially when a system's outputs must be continuously updated to
reflect changing data. We consider the achievable error for differentially
private continual release of a basic statistic -- the number of distinct items
-- in a stream where items may be both inserted and deleted (the turnstile
model). With only insertions, existing algorithms have additive error just
polylogarithmic in the length of the stream . We uncover a much richer
landscape in the turnstile model, even without considering memory restrictions.
We show that every differentially private mechanism that handles insertions and
deletions has worst-case additive error at least even under a
relatively weak, event-level privacy definition. Then, we identify a parameter
of the input stream, its maximum flippancy, that is low for natural data
streams and for which we give tight parameterized error guarantees.
Specifically, the maximum flippancy is the largest number of times that the
contribution of a single item to the distinct elements count changes over the
course of the stream. We present an item-level differentially private mechanism
that, for all turnstile streams with maximum flippancy , continually outputs
the number of distinct elements with an additive
error, without requiring prior knowledge of . We prove that this is the best
achievable error bound that depends only on , for a large range of values of
. When is small, the error of our mechanism is similar to the
polylogarithmic in error in the insertion-only setting, bypassing the
hardness in the turnstile model
Lower Bounds for Differential Privacy Under Continual Observation and Online Threshold Queries
One of the most basic problems for studying the price of privacy over time is the so called private counter problem, introduced by Dwork et al. (2010) and Chan et al. (2010). In this problem, we aim to track the number of events that occur over time, while hiding the existence of every single event. More specifically, in every time step we learn (in an online fashion) that new events have occurred, and must respond with an estimate . The privacy requirement is that all of the outputs together, across all time steps, satisfy event level differential privacy.
The main question here is how our error needs to depend on the total number of time steps and the total number of events . Dwork et al. (2015) showed an upper bound of , and Henzinger et al. (2023) showed a lower bound of . We show a new lower bound of , which is tight w.r.t. the dependence on , and is tight in the sparse case where . Our lower bound has the following implications:
(1) We show that our lower bound extends to the online thresholds problem, where the goal is to privately answer many quantile queries when these queries are presented one-by-one. This resolves an open question of Bun et al. (2017).
(2) Our lower bound implies, for the first time, a separation between the number of mistakes obtainable by a private online learner and a non-private online learner. This partially resolves a COLT\u2722 open question published by Sanyal and Ramponi.
(3) Our lower bound also yields the first separation between the standard model of private online learning and a recently proposed relaxed variant of it, called private online prediction
Quantifying Differential Privacy in Continuous Data Release under Temporal Correlations
Differential Privacy (DP) has received increasing attention as a rigorous
privacy framework. Many existing studies employ traditional DP mechanisms
(e.g., the Laplace mechanism) as primitives to continuously release private
data for protecting privacy at each time point (i.e., event-level privacy),
which assume that the data at different time points are independent, or that
adversaries do not have knowledge of correlation between data. However,
continuously generated data tend to be temporally correlated, and such
correlations can be acquired by adversaries. In this paper, we investigate the
potential privacy loss of a traditional DP mechanism under temporal
correlations. First, we analyze the privacy leakage of a DP mechanism under
temporal correlation that can be modeled using Markov Chain. Our analysis
reveals that, the event-level privacy loss of a DP mechanism may
\textit{increase over time}. We call the unexpected privacy loss
\textit{temporal privacy leakage} (TPL). Although TPL may increase over time,
we find that its supremum may exist in some cases. Second, we design efficient
algorithms for calculating TPL. Third, we propose data releasing mechanisms
that convert any existing DP mechanism into one against TPL. Experiments
confirm that our approach is efficient and effective.Comment: accepted in TKDE special issue "Best of ICDE 2017". arXiv admin note:
substantial text overlap with arXiv:1610.0754
Quantifying Differential Privacy under Temporal Correlations
Differential Privacy (DP) has received increased attention as a rigorous
privacy framework. Existing studies employ traditional DP mechanisms (e.g., the
Laplace mechanism) as primitives, which assume that the data are independent,
or that adversaries do not have knowledge of the data correlations. However,
continuously generated data in the real world tend to be temporally correlated,
and such correlations can be acquired by adversaries. In this paper, we
investigate the potential privacy loss of a traditional DP mechanism under
temporal correlations in the context of continuous data release. First, we
model the temporal correlations using Markov model and analyze the privacy
leakage of a DP mechanism when adversaries have knowledge of such temporal
correlations. Our analysis reveals that the privacy leakage of a DP mechanism
may accumulate and increase over time. We call it temporal privacy leakage.
Second, to measure such privacy leakage, we design an efficient algorithm for
calculating it in polynomial time. Although the temporal privacy leakage may
increase over time, we also show that its supremum may exist in some cases.
Third, to bound the privacy loss, we propose mechanisms that convert any
existing DP mechanism into one against temporal privacy leakage. Experiments
with synthetic data confirm that our approach is efficient and effective.Comment: appears at ICDE 201
Private Decayed Sum Estimation under Continual Observation
In monitoring applications, recent data is more important than distant data.
How does this affect privacy of data analysis? We study a general class of data
analyses - computing predicate sums - with privacy. Formally, we study the
problem of estimating predicate sums {\em privately}, for sliding windows (and
other well-known decay models of data, i.e. exponential and polynomial decay).
We extend the recently proposed continual privacy model of Dwork et al.
We present algorithms for decayed sum which are \eps-differentially
private, and are accurate. For window and exponential decay sums, our
algorithms are accurate up to additive 1/\eps and polylog terms in the range
of the computed function; for polynomial decay sums which are technically more
challenging because partial solutions do not compose easily, our algorithms
incur additional relative error. Further, we show lower bounds, tight within
polylog factors and tight with respect to the dependence on the probability of
error
Approximately Stable, School Optimal, and Student-Truthful Many-to-One Matchings (via Differential Privacy)
We present a mechanism for computing asymptotically stable school optimal
matchings, while guaranteeing that it is an asymptotic dominant strategy for
every student to report their true preferences to the mechanism. Our main tool
in this endeavor is differential privacy: we give an algorithm that coordinates
a stable matching using differentially private signals, which lead to our
truthfulness guarantee. This is the first setting in which it is known how to
achieve nontrivial truthfulness guarantees for students when computing school
optimal matchings, assuming worst- case preferences (for schools and students)
in large markets
Optimal State Estimation with Measurements Corrupted by Laplace Noise
Optimal state estimation for linear discrete-time systems is considered.
Motivated by the literature on differential privacy, the measurements are
assumed to be corrupted by Laplace noise. The optimal least mean square error
estimate of the state is approximated using a randomized method. The method
relies on that the Laplace noise can be rewritten as Gaussian noise scaled by
Rayleigh random variable. The probability of the event that the distance
between the approximation and the best estimate is smaller than a constant is
determined as function of the number of parallel Kalman filters that is used in
the randomized method. This estimator is then compared with the optimal linear
estimator, the maximum a posteriori (MAP) estimate of the state, and the
particle filter
- …