18 research outputs found
Testing Properties of Multiple Distributions with Few Samples
We propose a new setting for testing properties of distributions while
receiving samples from several distributions, but few samples per distribution.
Given samples from distributions, , we design
testers for the following problems: (1) Uniformity Testing: Testing whether all
the 's are uniform or -far from being uniform in
-distance (2) Identity Testing: Testing whether all the 's are
equal to an explicitly given distribution or -far from in
-distance, and (3) Closeness Testing: Testing whether all the 's
are equal to a distribution which we have sample access to, or
-far from in -distance. By assuming an additional natural
condition about the source distributions, we provide sample optimal testers for
all of these problems.Comment: ITCS 202
Local Differential Privacy Is Equivalent to Contraction of -Divergence
We investigate the local differential privacy (LDP) guarantees of a
randomized privacy mechanism via its contraction properties. We first show that
LDP constraints can be equivalently cast in terms of the contraction
coefficient of the -divergence. We then use this equivalent formula
to express LDP guarantees of privacy mechanisms in terms of contraction
coefficients of arbitrary -divergences. When combined with standard
estimation-theoretic tools (such as Le Cam's and Fano's converse methods), this
result allows us to study the trade-off between privacy and utility in several
testing and minimax and Bayesian estimation problems
Differentially Private Medians and Interior Points for Non-Pathological Data
We construct differentially private estimators with low sample complexity
that estimate the median of an arbitrary distribution over
satisfying very mild moment conditions. Our result stands in contrast to the
surprising negative result of Bun et al. (FOCS 2015) that showed there is no
differentially private estimator with any finite sample complexity that returns
any non-trivial approximation to the median of an arbitrary distribution
Testing Tail Weight of a Distribution Via Hazard Rate
Understanding the shape of a distribution of data is of interest to people in
a great variety of fields, as it may affect the types of algorithms used for
that data. Given samples from a distribution, we seek to understand how many
elements appear infrequently, that is, to characterize the tail of the
distribution. We develop an algorithm based on a careful bucketing scheme that
distinguishes heavy-tailed distributions from non-heavy-tailed ones via a
definition based on the hazard rate under some natural smoothness and ordering
assumptions. We verify our theoretical results empirically
Learning and testing junta distributions over hypercubes
Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015.Cataloged from PDF version of thesis.Includes bibliographical references (pages 77-80).Many tasks related to the analysis of high-dimensional datasets can be formalized as problems involving learning or testing properties of distributions over a high-dimensional domain. In this work, we initiate the study of the following general question: when many of the dimensions of the distribution correspond to "irrelevant" features in the associated dataset, can we learn the distribution efficiently? We formalize this question with the notion of junta distribution. The distribution D over {0, 1}n is a k-junta distribution if the probability mass function p of D is a k-junta-- i. e., if there is a set J [subset][n] of at most k coordinates such that for every x [set membership] {0, 1}7, the value of p(x) is completely determined by the value of x on the coordinates in J. We show that it is possible to learn k-junta distributions with a number of samples that depends only logarithmically on the total number n of dimensions. We give two proofs of this result; one using the cover method and one by developing a Fourier-based learning algorithm inspired by the Low-Degree Algorithm of Linial, Mansour, and Nisan (1993). We also consider the problem of testing whether an unknown distribution is a k-junta distribution. We introduce an algorithm for this task with sample complexity Õ(2n/²k⁴) and show that this bound is nearly optimal for constant values of k. As a byproduct of the analysis of the algorithm, we obtain an optimal bound on the number of samples required to test a weighted collection of distribution for uniformity. Finally, we establish the sample complexity for learning and testing other classes of distributions related to junta-distributions. Notably, we show that the task of testing whether a distribution on {0, 1}n contains a coordinate i [set membership] [n] such that xi is drawn independently from the remaining coordinates requires [theta]](2²n/³) samples. This is in contrast to the task of testing whether all of the coordinates are drawn independently from each other, which was recently shown to have sample complexity [theta](2n/²) by Acharya, Daskalakis, and Kamath (2015).by Maryam Aliakbarpour.S.M