1,918 research outputs found
Asymptotic Loss in Privacy due to Dependency in Gaussian Traces
The rapid growth of the Internet of Things (IoT) necessitates employing
privacy-preserving techniques to protect users' sensitive information. Even
when user traces are anonymized, statistical matching can be employed to infer
sensitive information. In our previous work, we have established the privacy
requirements for the case that the user traces are instantiations of discrete
random variables and the adversary knows only the structure of the dependency
graph, i.e., whether each pair of users is connected. In this paper, we
consider the case where data traces are instantiations of Gaussian random
variables and the adversary knows not only the structure of the graph but also
the pairwise correlation coefficients. We establish the requirements on
anonymization to thwart such statistical matching, which demonstrate the
significant degree to which knowledge of the pairwise correlation coefficients
further significantly aids the adversary in breaking user anonymity.Comment: IEEE Wireless Communications and Networking Conferenc
Recommended from our members
INFORMATION-THEORETIC LIMITS ON STATISTICAL MATCHING WITH APPLICATIONS TO PRIVACY
Modern applications significantly enhance the user experience by adapting to each user\u27s individual condition and/or preferences. While this adaptation can greatly improve a user\u27s experience or be essential for the application to work, the exposure of user data to the application presents a significant privacy threat to the users- even when the traces are anonymized (since the statistical matching of an anonymized trace to prior user behavior can identify a user and their habits). Because of the current and growing algorithmic and computational capabilities of adversaries, provable privacy guarantees as a function of the degree of anonymization and obfuscation of the traces are necessary. This dissertation focuses on deriving the theoretical bounds on the privacy of users in such a scenario. Here we derive the fundamental limits of user privacy when both anonymization and obfuscation-based protection mechanisms are applied to users\u27 time series of data. We investigate the impact of such mechanisms on the trade-off between privacy protection and user utility. In the first part, the requirements on anonymization and obfuscation in the case that data traces are independent between users are obtained. However, the data traces of different users will be dependent in many applications, and an adversary can potentially exploit such. So in the next part, we consider the impact of dependency between user traces on their privacy. In order to do that, we demonstrate that the adversary can readily identify the association graph of the obfuscated and anonymized version of the data, revealing which user data traces are dependent, and then, we demonstrate that the adversary can use this association graph to break user privacy with significantly shorter traces than in the case of independent users. As a result, we show inter-user dependency degrades user privacy. We show that obfuscating data traces independently across users is often insufficient to remedy such leakage. Therefore, we discuss how users can improve privacy by employing joint obfuscation that removes the data dependency. Finally, we discuss how the remapping technique came to our help to improve user utility and how much remapping is leaking to the adversary when the adversary does not have the full prior information
Private Graphon Estimation for Sparse Graphs
We design algorithms for fitting a high-dimensional statistical model to a
large, sparse network without revealing sensitive information of individual
members. Given a sparse input graph , our algorithms output a
node-differentially-private nonparametric block model approximation. By
node-differentially-private, we mean that our output hides the insertion or
removal of a vertex and all its adjacent edges. If is an instance of the
network obtained from a generative nonparametric model defined in terms of a
graphon , our model guarantees consistency, in the sense that as the number
of vertices tends to infinity, the output of our algorithm converges to in
an appropriate version of the norm. In particular, this means we can
estimate the sizes of all multi-way cuts in .
Our results hold as long as is bounded, the average degree of grows
at least like the log of the number of vertices, and the number of blocks goes
to infinity at an appropriate rate. We give explicit error bounds in terms of
the parameters of the model; in several settings, our bounds improve on or
match known nonprivate results.Comment: 36 page
Distribution-Preserving Statistical Disclosure Limitation
One approach to limiting disclosure risk in public-use microdata is to release multiply-imputed, partially synthetic data sets. These are data on actual respondents, but with confidential data replaced by multiply-imputed synthetic values. A mis-specified imputation model can invalidate inferences because the distribution of synthetic data is completely determined by the model used to generate them. We present two practical methods of generating synthetic values when the imputer has only limited information about the true data generating process. One is applicable when the true likelihood is known up to a monotone transformation. The second requires only limited knowledge of the true likelihood, but nevertheless preserves the conditional distribution of the confidential data, up to sampling error, on arbitrary subdomains. Our method maximizes data utility and minimizes incremental disclosure risk up to posterior uncertainty in the imputation model and sampling error in the estimated transformation. We validate the approach with a simulation and application to a large linked employer-employee database.statistical disclosure limitation; confidentiality; privacy; multiple imputation; partially synthetic data
- …