1,918 research outputs found

    Asymptotic Loss in Privacy due to Dependency in Gaussian Traces

    Full text link
    The rapid growth of the Internet of Things (IoT) necessitates employing privacy-preserving techniques to protect users' sensitive information. Even when user traces are anonymized, statistical matching can be employed to infer sensitive information. In our previous work, we have established the privacy requirements for the case that the user traces are instantiations of discrete random variables and the adversary knows only the structure of the dependency graph, i.e., whether each pair of users is connected. In this paper, we consider the case where data traces are instantiations of Gaussian random variables and the adversary knows not only the structure of the graph but also the pairwise correlation coefficients. We establish the requirements on anonymization to thwart such statistical matching, which demonstrate the significant degree to which knowledge of the pairwise correlation coefficients further significantly aids the adversary in breaking user anonymity.Comment: IEEE Wireless Communications and Networking Conferenc

    Private Graphon Estimation for Sparse Graphs

    Get PDF
    We design algorithms for fitting a high-dimensional statistical model to a large, sparse network without revealing sensitive information of individual members. Given a sparse input graph GG, our algorithms output a node-differentially-private nonparametric block model approximation. By node-differentially-private, we mean that our output hides the insertion or removal of a vertex and all its adjacent edges. If GG is an instance of the network obtained from a generative nonparametric model defined in terms of a graphon WW, our model guarantees consistency, in the sense that as the number of vertices tends to infinity, the output of our algorithm converges to WW in an appropriate version of the L2L_2 norm. In particular, this means we can estimate the sizes of all multi-way cuts in GG. Our results hold as long as WW is bounded, the average degree of GG grows at least like the log of the number of vertices, and the number of blocks goes to infinity at an appropriate rate. We give explicit error bounds in terms of the parameters of the model; in several settings, our bounds improve on or match known nonprivate results.Comment: 36 page

    Distribution-Preserving Statistical Disclosure Limitation

    Get PDF
    One approach to limiting disclosure risk in public-use microdata is to release multiply-imputed, partially synthetic data sets. These are data on actual respondents, but with confidential data replaced by multiply-imputed synthetic values. A mis-specified imputation model can invalidate inferences because the distribution of synthetic data is completely determined by the model used to generate them. We present two practical methods of generating synthetic values when the imputer has only limited information about the true data generating process. One is applicable when the true likelihood is known up to a monotone transformation. The second requires only limited knowledge of the true likelihood, but nevertheless preserves the conditional distribution of the confidential data, up to sampling error, on arbitrary subdomains. Our method maximizes data utility and minimizes incremental disclosure risk up to posterior uncertainty in the imputation model and sampling error in the estimated transformation. We validate the approach with a simulation and application to a large linked employer-employee database.statistical disclosure limitation; confidentiality; privacy; multiple imputation; partially synthetic data
    • …
    corecore