Search CORE

28 research outputs found

Testing Shape Restrictions of Discrete Distributions

Author: Canonne Clément L.
Diakonikolas Ilias
Gouleakis Themistoklis
Rubinfeld Ronitt
Publication venue: Dagstuhl Publishing
Publication date: 01/01/2016
Field of study

We study the question of testing structured properties (classes) of discrete distributions. Specifically, given sample access to an arbitrary distribution D over [n] and a property P, the goal is to distinguish between D in P and l_{1}(D,P)>epsilon. We develop a general algorithm for this question, which applies to a large range of "shape-constrained" properties, including monotone, log-concave, t-modal, piecewise-polynomial, and Poisson Binomial distributions. Moreover, for all cases considered, our algorithm has near-optimal sample complexity with regard to the domain size and is computationally efficient. For most of these classes, we provide the first non-trivial tester in the literature. In addition, we also describe a generic method to prove lower bounds for this problem, and use it to show our upper bounds are nearly tight. Finally, we extend some of our techniques to tolerant testing, deriving nearly-tight upper and lower bounds for the corresponding questions

arXiv.org e-Print Archive

DSpace@MIT

Edinburgh Research Explorer

Dagstuhl Research Online Publication Server

Testing Conditional Independence of Discrete Distributions

Author: Acharya J.
Batu T.
Bayesian Square Hellinger
Canonne C.
Canonne C. L.
Canonne C. L.
Closeness Near-Optimal
Fisher R. A.
Hardt M.
Mantel N.
Publication venue
Publication date: 20/06/2018
Field of study

We study the problem of testing \emph{conditional independence} for discrete distributions. Specifically, given samples from a discrete random variable

(X, Y, Z)

on domain

[\ell_1]\times[\ell_2] \times [n]

, we want to distinguish, with probability at least

2/3

, between the case that

X

and

Y

are conditionally independent given

Z

from the case that

(X, Y, Z)

\epsilon

-far, in

\ell_1

-distance, from every distribution that has this property. Conditional independence is a concept of central importance in probability and statistics with a range of applications in various scientific domains. As such, the statistical task of testing conditional independence has been extensively studied in various forms within the statistics and econometrics communities for nearly a century. Perhaps surprisingly, this problem has not been previously considered in the framework of distribution property testing and in particular no tester with sublinear sample complexity is known, even for the important special case that the domains of

X

and

Y

are binary. The main algorithmic result of this work is the first conditional independence tester with {\em sublinear} sample complexity for discrete distributions over

[\ell_1]\times[\ell_2] \times [n]

. To complement our upper bounds, we prove information-theoretic lower bounds establishing that the sample complexity of our algorithm is optimal, up to constant factors, for a number of settings. Specifically, for the prototypical setting when

\ell_1, \ell_2 = O(1)

, we show that the sample complexity of testing conditional independence (upper bound and matching lower bound) is \[ \Theta\left({\max\left(n^{1/2}/\epsilon^2,\min\left(n^{7/8}/\epsilon,n^{6/7}/\epsilon^{8/7}\right)\right)}\right)\,. \

arXiv.org e-Print Archive

Crossref

eScholarship - University of California