We study a new framework for property testing of probability distributions,
by considering distribution testing algorithms that have access to a
conditional sampling oracle.* This is an oracle that takes as input a subset S⊆[N] of the domain [N] of the unknown probability distribution D
and returns a draw from the conditional probability distribution D restricted
to S. This new model allows considerable flexibility in the design of
distribution testing algorithms; in particular, testing algorithms in this
model can be adaptive.
We study a wide range of natural distribution testing problems in this new
framework and some of its variants, giving both upper and lower bounds on query
complexity. These problems include testing whether D is the uniform
distribution U; testing whether D=D∗ for an explicitly
provided D∗; testing whether two unknown distributions D1 and D2
are equivalent; and estimating the variation distance between D and the
uniform distribution. At a high level our main finding is that the new
"conditional sampling" framework we consider is a powerful one: while all the
problems mentioned above have Ω(N) sample complexity in the
standard model (and in some cases the complexity must be almost linear in N),
we give poly(logN,1/ε)-query algorithms (and in some
cases poly(1/ε)-query algorithms independent of N) for
all these problems in our conditional sampling setting.
*Independently from our work, Chakraborty et al. also considered this
framework. We discuss their work in Subsection [1.4].Comment: Significant changes on Section 9 (detailing and expanding the proof
of Theorem 16). Several clarifications and typos fixed in various place