1,298 research outputs found

    Private Distribution Testing with Heterogeneous Constraints: Your Epsilon Might Not Be Mine

    Full text link
    Private closeness testing asks to decide whether the underlying probability distributions of two sensitive datasets are identical or differ significantly in statistical distance, while guaranteeing (differential) privacy of the data. As in most (if not all) distribution testing questions studied under privacy constraints, however, previous work assumes that the two datasets are equally sensitive, i.e., must be provided the same privacy guarantees. This is often an unrealistic assumption, as different sources of data come with different privacy requirements; as a result, known closeness testing algorithms might be unnecessarily conservative, "paying" too high a privacy budget for half of the data. In this work, we initiate the study of the closeness testing problem under heterogeneous privacy constraints, where the two datasets come with distinct privacy requirements. We formalize the question and provide algorithms under the three most widely used differential privacy settings, with a particular focus on the local and shuffle models of privacy; and show that one can indeed achieve better sample efficiency when taking into account the two different "epsilon" requirements

    Two Party Distribution Testing: Communication and Security

    Get PDF
    We study the problem of discrete distribution testing in the two-party setting. For example, in the standard closeness testing problem, Alice and Bob each have t samples from, respectively, distributions a and b over [n], and they need to test whether a=b or a,b are epsilon-far (in the l_1 distance). This is in contrast to the well-studied one-party case, where the tester has unrestricted access to samples of both distributions. Despite being a natural constraint in applications, the two-party setting has previously evaded attention. We address two fundamental aspects of the two-party setting: 1) what is the communication complexity, and 2) can it be accomplished securely, without Alice and Bob learning extra information about each other\u27s input. Besides closeness testing, we also study the independence testing problem, where Alice and Bob have t samples from distributions a and b respectively, which may be correlated; the question is whether a,b are independent or epsilon-far from being independent. Our contribution is three-fold: 1) We show how to gain communication efficiency given more samples, beyond the information-theoretic bound on t. The gain is polynomially better than what one would obtain via adapting one-party algorithms. 2) We prove tightness of our trade-off for the closeness testing, as well as that the independence testing requires tight Omega(sqrt{m}) communication for unbounded number of samples. These lower bounds are of independent interest as, to the best of our knowledge, these are the first 2-party communication lower bounds for testing problems, where the inputs are a set of i.i.d. samples. 3) We define the concept of secure distribution testing, and provide secure versions of the above protocols with an overhead that is only polynomial in the security parameter
    • …
    corecore