130 research outputs found

    The price of differential privacy under continual observation

    We study the accuracy of differentially private mechanisms in the continual release model. A continual release mechanism receives a sensitive dataset as a stream of T inputs and produces, after receiving each input, an accurate output on the obtained inputs. In contrast, a batch algorithm receives the data as one batch and produces a single output. We provide the first strong lower bounds on the error of continual release mechanisms. In particular, for two fundamental problems that are widely studied and used in the batch model, we show that the worst case error of every continual release algorithm is ~Ω (T^1/3) times larger than that of the best batch algorithm. Previous work shows only a polylogarithimic (in T) gap between the worst case error achievable in these two models; further, for many problems, including the summation of binary attributes, the polylogarithmic gap is tight (Dwork et al., 2010; Chan et al., 2010). Our results show that problems closely related to summation-specifically, those that require selecting the largest of a set of sums|are fundamentally harder in the continual release model than in the batch model. Our lower bounds assume only that privacy holds for streams fixed in advance (the "nonadaptive" setting). However, we provide matching upper bounds that hold in a model where privacy is required even for adaptively selected streams. This model may be of independent interest.https://arxiv.org/abs/2112.0082

    Counting Distinct Elements in the Turnstile Model with Differential Privacy under Continual Observation

    Privacy is a central challenge for systems that learn from sensitive data sets, especially when a system's outputs must be continuously updated to reflect changing data. We consider the achievable error for differentially private continual release of a basic statistic -- the number of distinct items -- in a stream where items may be both inserted and deleted (the turnstile model). With only insertions, existing algorithms have additive error just polylogarithmic in the length of the stream TT. We uncover a much richer landscape in the turnstile model, even without considering memory restrictions. We show that every differentially private mechanism that handles insertions and deletions has worst-case additive error at least T1/4T^{1/4} even under a relatively weak, event-level privacy definition. Then, we identify a parameter of the input stream, its maximum flippancy, that is low for natural data streams and for which we give tight parameterized error guarantees. Specifically, the maximum flippancy is the largest number of times that the contribution of a single item to the distinct elements count changes over the course of the stream. We present an item-level differentially private mechanism that, for all turnstile streams with maximum flippancy ww, continually outputs the number of distinct elements with an O(wpolylogT)O(\sqrt{w} \cdot poly\log T) additive error, without requiring prior knowledge of ww. We prove that this is the best achievable error bound that depends only on ww, for a large range of values of ww. When ww is small, the error of our mechanism is similar to the polylogarithmic in TT error in the insertion-only setting, bypassing the hardness in the turnstile model

    Lower Bounds for Differential Privacy Under Continual Observation and Online Threshold Queries

    One of the most basic problems for studying the price of privacy over time is the so called private counter problem, introduced by Dwork et al. (2010) and Chan et al. (2010). In this problem, we aim to track the number of events that occur over time, while hiding the existence of every single event. More specifically, in every time step t[T]t\in[T] we learn (in an online fashion) that Δt0\Delta_t\geq 0 new events have occurred, and must respond with an estimate ntj=1tΔjn_t\approx\sum_{j=1}^t \Delta_j. The privacy requirement is that all of the outputs together, across all time steps, satisfy event level differential privacy. The main question here is how our error needs to depend on the total number of time steps TT and the total number of events nn. Dwork et al. (2015) showed an upper bound of O(log(T)+log2(n))O\left(\log(T)+\log^2(n)\right), and Henzinger et al. (2023) showed a lower bound of Ω(min{logn,logT})\Omega\left(\min\{\log n, \log T\}\right). We show a new lower bound of Ω(min{n,logT})\Omega\left(\min\{n,\log T\}\right), which is tight w.r.t. the dependence on TT, and is tight in the sparse case where log2n=O(logT)\log^2 n=O(\log T). Our lower bound has the following implications: (1) We show that our lower bound extends to the online thresholds problem, where the goal is to privately answer many quantile queries when these queries are presented one-by-one. This resolves an open question of Bun et al. (2017). (2) Our lower bound implies, for the first time, a separation between the number of mistakes obtainable by a private online learner and a non-private online learner. This partially resolves a COLT\u2722 open question published by Sanyal and Ramponi. (3) Our lower bound also yields the first separation between the standard model of private online learning and a recently proposed relaxed variant of it, called private online prediction

    Quantifying Differential Privacy in Continuous Data Release under Temporal Correlations

    Differential Privacy (DP) has received increasing attention as a rigorous privacy framework. Many existing studies employ traditional DP mechanisms (e.g., the Laplace mechanism) as primitives to continuously release private data for protecting privacy at each time point (i.e., event-level privacy), which assume that the data at different time points are independent, or that adversaries do not have knowledge of correlation between data. However, continuously generated data tend to be temporally correlated, and such correlations can be acquired by adversaries. In this paper, we investigate the potential privacy loss of a traditional DP mechanism under temporal correlations. First, we analyze the privacy leakage of a DP mechanism under temporal correlation that can be modeled using Markov Chain. Our analysis reveals that, the event-level privacy loss of a DP mechanism may \textit{increase over time}. We call the unexpected privacy loss \textit{temporal privacy leakage} (TPL). Although TPL may increase over time, we find that its supremum may exist in some cases. Second, we design efficient algorithms for calculating TPL. Third, we propose data releasing mechanisms that convert any existing DP mechanism into one against TPL. Experiments confirm that our approach is efficient and effective.Comment: accepted in TKDE special issue "Best of ICDE 2017". arXiv admin note: substantial text overlap with arXiv:1610.0754

    Quantifying Differential Privacy under Temporal Correlations

    Differential Privacy (DP) has received increased attention as a rigorous privacy framework. Existing studies employ traditional DP mechanisms (e.g., the Laplace mechanism) as primitives, which assume that the data are independent, or that adversaries do not have knowledge of the data correlations. However, continuously generated data in the real world tend to be temporally correlated, and such correlations can be acquired by adversaries. In this paper, we investigate the potential privacy loss of a traditional DP mechanism under temporal correlations in the context of continuous data release. First, we model the temporal correlations using Markov model and analyze the privacy leakage of a DP mechanism when adversaries have knowledge of such temporal correlations. Our analysis reveals that the privacy leakage of a DP mechanism may accumulate and increase over time. We call it temporal privacy leakage. Second, to measure such privacy leakage, we design an efficient algorithm for calculating it in polynomial time. Although the temporal privacy leakage may increase over time, we also show that its supremum may exist in some cases. Third, to bound the privacy loss, we propose mechanisms that convert any existing DP mechanism into one against temporal privacy leakage. Experiments with synthetic data confirm that our approach is efficient and effective.Comment: appears at ICDE 201

    Private Decayed Sum Estimation under Continual Observation

    In monitoring applications, recent data is more important than distant data. How does this affect privacy of data analysis? We study a general class of data analyses - computing predicate sums - with privacy. Formally, we study the problem of estimating predicate sums {\em privately}, for sliding windows (and other well-known decay models of data, i.e. exponential and polynomial decay). We extend the recently proposed continual privacy model of Dwork et al. We present algorithms for decayed sum which are \eps-differentially private, and are accurate. For window and exponential decay sums, our algorithms are accurate up to additive 1/\eps and polylog terms in the range of the computed function; for polynomial decay sums which are technically more challenging because partial solutions do not compose easily, our algorithms incur additional relative error. Further, we show lower bounds, tight within polylog factors and tight with respect to the dependence on the probability of error

    Approximately Stable, School Optimal, and Student-Truthful Many-to-One Matchings (via Differential Privacy)

    We present a mechanism for computing asymptotically stable school optimal matchings, while guaranteeing that it is an asymptotic dominant strategy for every student to report their true preferences to the mechanism. Our main tool in this endeavor is differential privacy: we give an algorithm that coordinates a stable matching using differentially private signals, which lead to our truthfulness guarantee. This is the first setting in which it is known how to achieve nontrivial truthfulness guarantees for students when computing school optimal matchings, assuming worst- case preferences (for schools and students) in large markets

    Optimal State Estimation with Measurements Corrupted by Laplace Noise

    Optimal state estimation for linear discrete-time systems is considered. Motivated by the literature on differential privacy, the measurements are assumed to be corrupted by Laplace noise. The optimal least mean square error estimate of the state is approximated using a randomized method. The method relies on that the Laplace noise can be rewritten as Gaussian noise scaled by Rayleigh random variable. The probability of the event that the distance between the approximation and the best estimate is smaller than a constant is determined as function of the number of parallel Kalman filters that is used in the randomized method. This estimator is then compared with the optimal linear estimator, the maximum a posteriori (MAP) estimate of the state, and the particle filter