7 research outputs found
Empirical Risk Minimization in the Non-interactive Local Model of Differential Privacy
In this paper, we study the Empirical Risk Minimization (ERM) problem in the
non-interactive Local Differential Privacy (LDP) model. Previous research on
this problem \citep{smith2017interaction} indicates that the sample complexity,
to achieve error , needs to be exponentially depending on the
dimensionality for general loss functions. In this paper, we make two
attempts to resolve this issue by investigating conditions on the loss
functions that allow us to remove such a limit. In our first attempt, we show
that if the loss function is -smooth, by using the Bernstein
polynomial approximation we can avoid the exponential dependency in the term of
. We then propose player-efficient algorithms with -bit
communication complexity and computation cost for each player. The error
bound of these algorithms is asymptotically the same as the original one. With
some additional assumptions, we also give an algorithm which is more efficient
for the server. In our second attempt, we show that for any -Lipschitz
generalized linear convex loss function, there is an -LDP
algorithm whose sample complexity for achieving error is only linear
in the dimensionality . Our results use a polynomial of inner product
approximation technique. Finally, motivated by the idea of using polynomial
approximation and based on different types of polynomial approximations, we
propose (efficient) non-interactive locally differentially private algorithms
for learning the set of k-way marginal queries and the set of smooth queries.Comment: Appeared at Journal of Machine Learning Research. The journal version
of arXiv:1802.04085, fixed a bug in arXiv:1812.0682
OpBoost: A Vertical Federated Tree Boosting Framework Based on Order-Preserving Desensitization
Vertical Federated Learning (FL) is a new paradigm that enables users with
non-overlapping attributes of the same data samples to jointly train a model
without directly sharing the raw data. Nevertheless, recent works show that
it's still not sufficient to prevent privacy leakage from the training process
or the trained model. This paper focuses on studying the privacy-preserving
tree boosting algorithms under the vertical FL. The existing solutions based on
cryptography involve heavy computation and communication overhead and are
vulnerable to inference attacks. Although the solution based on Local
Differential Privacy (LDP) addresses the above problems, it leads to the low
accuracy of the trained model.
This paper explores to improve the accuracy of the widely deployed tree
boosting algorithms satisfying differential privacy under vertical FL.
Specifically, we introduce a framework called OpBoost. Three order-preserving
desensitization algorithms satisfying a variant of LDP called distance-based
LDP (dLDP) are designed to desensitize the training data. In particular, we
optimize the dLDP definition and study efficient sampling distributions to
further improve the accuracy and efficiency of the proposed algorithms. The
proposed algorithms provide a trade-off between the privacy of pairs with large
distance and the utility of desensitized values. Comprehensive evaluations show
that OpBoost has a better performance on prediction accuracy of trained models
compared with existing LDP approaches on reasonable settings. Our code is open
source
Intertwining Order Preserving Encryption and Differential Privacy
Ciphertexts of an order-preserving encryption (OPE) scheme preserve the order
of their corresponding plaintexts. However, OPEs are vulnerable to inference
attacks that exploit this preserved order. At another end, differential privacy
has become the de-facto standard for achieving data privacy. One of the most
attractive properties of DP is that any post-processing (inferential)
computation performed on the noisy output of a DP algorithm does not degrade
its privacy guarantee. In this paper, we intertwine the two approaches and
propose a novel differentially private order preserving encryption scheme,
OP. Under OP, the leakage of order from the ciphertexts is
differentially private. As a result, in the least, OP ensures a
formal guarantee (specifically, a relaxed DP guarantee) even in the face of
inference attacks. To the best of our knowledge, this is the first work to
intertwine DP with a property-preserving encryption scheme. We demonstrate
OP's practical utility in answering range queries via extensive
empirical evaluation on four real-world datasets. For instance, OP
misses only around in every correct records on average for a dataset
of size with an attribute of domain size and
On the Risks of Collecting Multidimensional Data Under Local Differential Privacy
The private collection of multiple statistics from a population is a
fundamental statistical problem. One possible approach to realize this is to
rely on the local model of differential privacy (LDP). Numerous LDP protocols
have been developed for the task of frequency estimation of single and multiple
attributes. These studies mainly focused on improving the utility of the
algorithms to ensure the server performs the estimations accurately. In this
paper, we investigate privacy threats (re-identification and attribute
inference attacks) against LDP protocols for multidimensional data following
two state-of-the-art solutions for frequency estimation of multiple attributes.
To broaden the scope of our study, we have also experimentally assessed five
widely used LDP protocols, namely, generalized randomized response, optimal
local hashing, subset selection, RAPPOR and optimal unary encoding. Finally, we
also proposed a countermeasure that improves both utility and robustness
against the identified threats. Our contributions can help practitioners aiming
to collect users' statistics privately to decide which LDP mechanism best fits
their needs.Comment: Accepted at VLDB 202