7 research outputs found

    Empirical Risk Minimization in the Non-interactive Local Model of Differential Privacy

    Get PDF
    In this paper, we study the Empirical Risk Minimization (ERM) problem in the non-interactive Local Differential Privacy (LDP) model. Previous research on this problem \citep{smith2017interaction} indicates that the sample complexity, to achieve error α\alpha, needs to be exponentially depending on the dimensionality pp for general loss functions. In this paper, we make two attempts to resolve this issue by investigating conditions on the loss functions that allow us to remove such a limit. In our first attempt, we show that if the loss function is (,T)(\infty, T)-smooth, by using the Bernstein polynomial approximation we can avoid the exponential dependency in the term of α\alpha. We then propose player-efficient algorithms with 11-bit communication complexity and O(1)O(1) computation cost for each player. The error bound of these algorithms is asymptotically the same as the original one. With some additional assumptions, we also give an algorithm which is more efficient for the server. In our second attempt, we show that for any 11-Lipschitz generalized linear convex loss function, there is an (ϵ,δ)(\epsilon, \delta)-LDP algorithm whose sample complexity for achieving error α\alpha is only linear in the dimensionality pp. Our results use a polynomial of inner product approximation technique. Finally, motivated by the idea of using polynomial approximation and based on different types of polynomial approximations, we propose (efficient) non-interactive locally differentially private algorithms for learning the set of k-way marginal queries and the set of smooth queries.Comment: Appeared at Journal of Machine Learning Research. The journal version of arXiv:1802.04085, fixed a bug in arXiv:1812.0682

    OpBoost: A Vertical Federated Tree Boosting Framework Based on Order-Preserving Desensitization

    Full text link
    Vertical Federated Learning (FL) is a new paradigm that enables users with non-overlapping attributes of the same data samples to jointly train a model without directly sharing the raw data. Nevertheless, recent works show that it's still not sufficient to prevent privacy leakage from the training process or the trained model. This paper focuses on studying the privacy-preserving tree boosting algorithms under the vertical FL. The existing solutions based on cryptography involve heavy computation and communication overhead and are vulnerable to inference attacks. Although the solution based on Local Differential Privacy (LDP) addresses the above problems, it leads to the low accuracy of the trained model. This paper explores to improve the accuracy of the widely deployed tree boosting algorithms satisfying differential privacy under vertical FL. Specifically, we introduce a framework called OpBoost. Three order-preserving desensitization algorithms satisfying a variant of LDP called distance-based LDP (dLDP) are designed to desensitize the training data. In particular, we optimize the dLDP definition and study efficient sampling distributions to further improve the accuracy and efficiency of the proposed algorithms. The proposed algorithms provide a trade-off between the privacy of pairs with large distance and the utility of desensitized values. Comprehensive evaluations show that OpBoost has a better performance on prediction accuracy of trained models compared with existing LDP approaches on reasonable settings. Our code is open source

    Intertwining Order Preserving Encryption and Differential Privacy

    Full text link
    Ciphertexts of an order-preserving encryption (OPE) scheme preserve the order of their corresponding plaintexts. However, OPEs are vulnerable to inference attacks that exploit this preserved order. At another end, differential privacy has become the de-facto standard for achieving data privacy. One of the most attractive properties of DP is that any post-processing (inferential) computation performed on the noisy output of a DP algorithm does not degrade its privacy guarantee. In this paper, we intertwine the two approaches and propose a novel differentially private order preserving encryption scheme, OPϵ\epsilon. Under OPϵ\epsilon, the leakage of order from the ciphertexts is differentially private. As a result, in the least, OPϵ\epsilon ensures a formal guarantee (specifically, a relaxed DP guarantee) even in the face of inference attacks. To the best of our knowledge, this is the first work to intertwine DP with a property-preserving encryption scheme. We demonstrate OPϵ\epsilon's practical utility in answering range queries via extensive empirical evaluation on four real-world datasets. For instance, OPϵ\epsilon misses only around 44 in every 10K10K correct records on average for a dataset of size 732K\sim732K with an attribute of domain size 18K\sim18K and ϵ=1\epsilon= 1

    On the Risks of Collecting Multidimensional Data Under Local Differential Privacy

    Full text link
    The private collection of multiple statistics from a population is a fundamental statistical problem. One possible approach to realize this is to rely on the local model of differential privacy (LDP). Numerous LDP protocols have been developed for the task of frequency estimation of single and multiple attributes. These studies mainly focused on improving the utility of the algorithms to ensure the server performs the estimations accurately. In this paper, we investigate privacy threats (re-identification and attribute inference attacks) against LDP protocols for multidimensional data following two state-of-the-art solutions for frequency estimation of multiple attributes. To broaden the scope of our study, we have also experimentally assessed five widely used LDP protocols, namely, generalized randomized response, optimal local hashing, subset selection, RAPPOR and optimal unary encoding. Finally, we also proposed a countermeasure that improves both utility and robustness against the identified threats. Our contributions can help practitioners aiming to collect users' statistics privately to decide which LDP mechanism best fits their needs.Comment: Accepted at VLDB 202
    corecore