Search CORE

18 research outputs found

Testing Properties of Multiple Distributions with Few Samples

Author: Aliakbarpour Maryam
Silwal Sandeep
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 11th Innovations in Theoretical Computer Science Conference (ITCS 2020)
Publication date: 17/11/2019
Field of study

We propose a new setting for testing properties of distributions while receiving samples from several distributions, but few samples per distribution. Given samples from

s

distributions,

p_1, p_2, \ldots, p_s

, we design testers for the following problems: (1) Uniformity Testing: Testing whether all the

p_i

's are uniform or

\epsilon

-far from being uniform in

\ell_1

-distance (2) Identity Testing: Testing whether all the

p_i

's are equal to an explicitly given distribution

q

\epsilon

-far from

q

\ell_1

-distance, and (3) Closeness Testing: Testing whether all the

p_i

's are equal to a distribution

q

which we have sample access to, or

\epsilon

-far from

q

\ell_1

-distance. By assuming an additional natural condition about the source distributions, we provide sample optimal testers for all of these problems.Comment: ITCS 202

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Local Differential Privacy Is Equivalent to Contraction of $E_\gamma$ -Divergence

Author: Aliakbarpour Maryam
Asoodeh Shahab
Calmon Flavio P.
Publication venue
Publication date: 01/02/2021
Field of study

We investigate the local differential privacy (LDP) guarantees of a randomized privacy mechanism via its contraction properties. We first show that LDP constraints can be equivalently cast in terms of the contraction coefficient of the

E_\gamma

-divergence. We then use this equivalent formula to express LDP guarantees of privacy mechanisms in terms of contraction coefficients of arbitrary

f

-divergences. When combined with standard estimation-theoretic tools (such as Le Cam's and Fano's converse methods), this result allows us to study the trade-off between privacy and utility in several testing and minimax and Bayesian estimation problems

arXiv.org e-Print Archive

Differentially Private Medians and Interior Points for Non-Pathological Data

Author: Aliakbarpour Maryam
Silver Rose
Steinke Thomas
Ullman Jonathan
Publication venue
Publication date: 22/05/2023
Field of study

We construct differentially private estimators with low sample complexity that estimate the median of an arbitrary distribution over

\mathbb{R}

satisfying very mild moment conditions. Our result stands in contrast to the surprising negative result of Bun et al. (FOCS 2015) that showed there is no differentially private estimator with any finite sample complexity that returns any non-trivial approximation to the median of an arbitrary distribution

arXiv.org e-Print Archive

Testing Tail Weight of a Distribution Via Hazard Rate

Author: Aliakbarpour Maryam
Biswas Amartya Shankha
Ravichandran Kavya
Rubinfeld Ronitt
Publication venue
Publication date: 06/10/2020
Field of study

Understanding the shape of a distribution of data is of interest to people in a great variety of fields, as it may affect the types of algorithms used for that data. Given samples from a distribution, we seek to understand how many elements appear infrequently, that is, to characterize the tail of the distribution. We develop an algorithm based on a careful bucketing scheme that distinguishes heavy-tailed distributions from non-heavy-tailed ones via a definition based on the hazard rate under some natural smoothness and ordering assumptions. We verify our theoretical results empirically

arXiv.org e-Print Archive

Learning and testing junta distributions over hypercubes

Author: Aliakbarpour Maryam
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2015
Field of study

Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015.Cataloged from PDF version of thesis.Includes bibliographical references (pages 77-80).Many tasks related to the analysis of high-dimensional datasets can be formalized as problems involving learning or testing properties of distributions over a high-dimensional domain. In this work, we initiate the study of the following general question: when many of the dimensions of the distribution correspond to "irrelevant" features in the associated dataset, can we learn the distribution efficiently? We formalize this question with the notion of junta distribution. The distribution D over {0, 1}n is a k-junta distribution if the probability mass function p of D is a k-junta-- i. e., if there is a set J [subset][n] of at most k coordinates such that for every x [set membership] {0, 1}7, the value of p(x) is completely determined by the value of x on the coordinates in J. We show that it is possible to learn k-junta distributions with a number of samples that depends only logarithmically on the total number n of dimensions. We give two proofs of this result; one using the cover method and one by developing a Fourier-based learning algorithm inspired by the Low-Degree Algorithm of Linial, Mansour, and Nisan (1993). We also consider the problem of testing whether an unknown distribution is a k-junta distribution. We introduce an algorithm for this task with sample complexity Õ(2n/²k⁴) and show that this bound is nearly optimal for constant values of k. As a byproduct of the analysis of the algorithm, we obtain an optimal bound on the number of samples required to test a weighted collection of distribution for uniformity. Finally, we establish the sample complexity for learning and testing other classes of distributions related to junta-distributions. Notably, we show that the task of testing whether a distribution on {0, 1}n contains a coordinate i [set membership] [n] such that xi is drawn independently from the remaining coordinates requires [theta]](2²n/³) samples. This is in contrast to the task of testing whether all of the coordinates are drawn independently from each other, which was recently shown to have sample complexity [theta](2n/²) by Acharya, Daskalakis, and Kamath (2015).by Maryam Aliakbarpour.S.M

DSpace@MIT