Search CORE

1,918 research outputs found

Asymptotic Loss in Privacy due to Dependency in Gaussian Traces

Author: Goeckel Dennis L.
Houmansadr Amir
Pishro-Nik Hossein
Soltani Ramin
Takbiri Nazanin
Publication venue
Publication date: 18/02/2019
Field of study

The rapid growth of the Internet of Things (IoT) necessitates employing privacy-preserving techniques to protect users' sensitive information. Even when user traces are anonymized, statistical matching can be employed to infer sensitive information. In our previous work, we have established the privacy requirements for the case that the user traces are instantiations of discrete random variables and the adversary knows only the structure of the dependency graph, i.e., whether each pair of users is connected. In this paper, we consider the case where data traces are instantiations of Gaussian random variables and the adversary knows not only the structure of the graph but also the pairwise correlation coefficients. We establish the requirements on anonymization to thwart such statistical matching, which demonstrate the significant degree to which knowledge of the pairwise correlation coefficients further significantly aids the adversary in breaking user anonymity.Comment: IEEE Wireless Communications and Networking Conferenc

arXiv.org e-Print Archive

Crossref

Recommended from our members

INFORMATION-THEORETIC LIMITS ON STATISTICAL MATCHING WITH APPLICATIONS TO PRIVACY

Author: Takbiri Nazanin
Publication venue: ScholarWorks@UMass Amherst
Publication date: 16/07/2020
Field of study

Modern applications significantly enhance the user experience by adapting to each user\u27s individual condition and/or preferences. While this adaptation can greatly improve a user\u27s experience or be essential for the application to work, the exposure of user data to the application presents a significant privacy threat to the users- even when the traces are anonymized (since the statistical matching of an anonymized trace to prior user behavior can identify a user and their habits). Because of the current and growing algorithmic and computational capabilities of adversaries, provable privacy guarantees as a function of the degree of anonymization and obfuscation of the traces are necessary. This dissertation focuses on deriving the theoretical bounds on the privacy of users in such a scenario. Here we derive the fundamental limits of user privacy when both anonymization and obfuscation-based protection mechanisms are applied to users\u27 time series of data. We investigate the impact of such mechanisms on the trade-off between privacy protection and user utility. In the first part, the requirements on anonymization and obfuscation in the case that data traces are independent between users are obtained. However, the data traces of different users will be dependent in many applications, and an adversary can potentially exploit such. So in the next part, we consider the impact of dependency between user traces on their privacy. In order to do that, we demonstrate that the adversary can readily identify the association graph of the obfuscated and anonymized version of the data, revealing which user data traces are dependent, and then, we demonstrate that the adversary can use this association graph to break user privacy with significantly shorter traces than in the case of independent users. As a result, we show inter-user dependency degrades user privacy. We show that obfuscating data traces independently across users is often insufficient to remedy such leakage. Therefore, we discuss how users can improve privacy by employing joint obfuscation that removes the data dependency. Finally, we discuss how the remapping technique came to our help to improve user utility and how much remapping is leaking to the adversary when the adversary does not have the full prior information

ScholarWorks@UMass Amherst

Private Graphon Estimation for Sparse Graphs

Author: Borgs Christian
Chayes Jennifer T.
Smith Adam
Publication venue
Publication date: 01/01/2015
Field of study

We design algorithms for fitting a high-dimensional statistical model to a large, sparse network without revealing sensitive information of individual members. Given a sparse input graph

G

, our algorithms output a node-differentially-private nonparametric block model approximation. By node-differentially-private, we mean that our output hides the insertion or removal of a vertex and all its adjacent edges. If

G

is an instance of the network obtained from a generative nonparametric model defined in terms of a graphon

W

, our model guarantees consistency, in the sense that as the number of vertices tends to infinity, the output of our algorithm converges to

W

in an appropriate version of the

L_2

norm. In particular, this means we can estimate the sizes of all multi-way cuts in

G

. Our results hold as long as

W

is bounded, the average degree of

G

grows at least like the log of the number of vertices, and the number of blocks goes to infinity at an appropriate rate. We give explicit error bounds in terms of the parameters of the model; in several settings, our bounds improve on or match known nonprivate results.Comment: 36 page

arXiv.org e-Print Archive

CiteSeerX

Distribution-Preserving Statistical Disclosure Limitation

Author: Benedetto Gary
Woodcock Simon
Publication venue
Publication date
Field of study

One approach to limiting disclosure risk in public-use microdata is to release multiply-imputed, partially synthetic data sets. These are data on actual respondents, but with confidential data replaced by multiply-imputed synthetic values. A mis-specified imputation model can invalidate inferences because the distribution of synthetic data is completely determined by the model used to generate them. We present two practical methods of generating synthetic values when the imputer has only limited information about the true data generating process. One is applicable when the true likelihood is known up to a monotone transformation. The second requires only limited knowledge of the true likelihood, but nevertheless preserves the conditional distribution of the confidential data, up to sampling error, on arbitrary subdomains. Our method maximizes data utility and minimizes incremental disclosure risk up to posterior uncertainty in the imputation model and sampling error in the estimated transformation. We validate the approach with a simulation and application to a large linked employer-employee database.statistical disclosure limitation; confidentiality; privacy; multiple imputation; partially synthetic data

Research Papers in Economics