18 research outputs found

    Testing Properties of Multiple Distributions with Few Samples

    Get PDF
    We propose a new setting for testing properties of distributions while receiving samples from several distributions, but few samples per distribution. Given samples from ss distributions, p1,p2,,psp_1, p_2, \ldots, p_s, we design testers for the following problems: (1) Uniformity Testing: Testing whether all the pip_i's are uniform or ϵ\epsilon-far from being uniform in 1\ell_1-distance (2) Identity Testing: Testing whether all the pip_i's are equal to an explicitly given distribution qq or ϵ\epsilon-far from qq in 1\ell_1-distance, and (3) Closeness Testing: Testing whether all the pip_i's are equal to a distribution qq which we have sample access to, or ϵ\epsilon-far from qq in 1\ell_1-distance. By assuming an additional natural condition about the source distributions, we provide sample optimal testers for all of these problems.Comment: ITCS 202

    Local Differential Privacy Is Equivalent to Contraction of EγE_\gamma-Divergence

    Full text link
    We investigate the local differential privacy (LDP) guarantees of a randomized privacy mechanism via its contraction properties. We first show that LDP constraints can be equivalently cast in terms of the contraction coefficient of the EγE_\gamma-divergence. We then use this equivalent formula to express LDP guarantees of privacy mechanisms in terms of contraction coefficients of arbitrary ff-divergences. When combined with standard estimation-theoretic tools (such as Le Cam's and Fano's converse methods), this result allows us to study the trade-off between privacy and utility in several testing and minimax and Bayesian estimation problems

    Differentially Private Medians and Interior Points for Non-Pathological Data

    Full text link
    We construct differentially private estimators with low sample complexity that estimate the median of an arbitrary distribution over R\mathbb{R} satisfying very mild moment conditions. Our result stands in contrast to the surprising negative result of Bun et al. (FOCS 2015) that showed there is no differentially private estimator with any finite sample complexity that returns any non-trivial approximation to the median of an arbitrary distribution

    Testing Tail Weight of a Distribution Via Hazard Rate

    Full text link
    Understanding the shape of a distribution of data is of interest to people in a great variety of fields, as it may affect the types of algorithms used for that data. Given samples from a distribution, we seek to understand how many elements appear infrequently, that is, to characterize the tail of the distribution. We develop an algorithm based on a careful bucketing scheme that distinguishes heavy-tailed distributions from non-heavy-tailed ones via a definition based on the hazard rate under some natural smoothness and ordering assumptions. We verify our theoretical results empirically

    Learning and testing junta distributions over hypercubes

    No full text
    Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015.Cataloged from PDF version of thesis.Includes bibliographical references (pages 77-80).Many tasks related to the analysis of high-dimensional datasets can be formalized as problems involving learning or testing properties of distributions over a high-dimensional domain. In this work, we initiate the study of the following general question: when many of the dimensions of the distribution correspond to "irrelevant" features in the associated dataset, can we learn the distribution efficiently? We formalize this question with the notion of junta distribution. The distribution D over {0, 1}n is a k-junta distribution if the probability mass function p of D is a k-junta-- i. e., if there is a set J [subset][n] of at most k coordinates such that for every x [set membership] {0, 1}7, the value of p(x) is completely determined by the value of x on the coordinates in J. We show that it is possible to learn k-junta distributions with a number of samples that depends only logarithmically on the total number n of dimensions. We give two proofs of this result; one using the cover method and one by developing a Fourier-based learning algorithm inspired by the Low-Degree Algorithm of Linial, Mansour, and Nisan (1993). We also consider the problem of testing whether an unknown distribution is a k-junta distribution. We introduce an algorithm for this task with sample complexity Õ(2n/²k⁴) and show that this bound is nearly optimal for constant values of k. As a byproduct of the analysis of the algorithm, we obtain an optimal bound on the number of samples required to test a weighted collection of distribution for uniformity. Finally, we establish the sample complexity for learning and testing other classes of distributions related to junta-distributions. Notably, we show that the task of testing whether a distribution on {0, 1}n contains a coordinate i [set membership] [n] such that xi is drawn independently from the remaining coordinates requires [theta]](2²n/³) samples. This is in contrast to the task of testing whether all of the coordinates are drawn independently from each other, which was recently shown to have sample complexity [theta](2n/²) by Acharya, Daskalakis, and Kamath (2015).by Maryam Aliakbarpour.S.M
    corecore