575 research outputs found

    Distributional Property Testing in a Quantum World

    Get PDF
    A fundamental problem in statistics and learning theory is to test properties of distributions. We show that quantum computers can solve such problems with significant speed-ups. We also introduce a novel access model for quantum distributions, enabling the coherent preparation of quantum samples, and propose a general framework that can naturally handle both classical and quantum distributions in a unified manner. Our framework generalizes and improves previous quantum algorithms for testing closeness between unknown distributions, testing independence between two distributions, and estimating the Shannon / von Neumann entropy of distributions. For classical distributions our algorithms significantly improve the precision dependence of some earlier results. We also show that in our framework procedures for classical distributions can be directly lifted to the more general case of quantum distributions, and thus obtain the first speed-ups for testing properties of density operators that can be accessed coherently rather than only via sampling

    Likelihood-free hypothesis testing

    Full text link
    Consider the problem of testing ZPmZ \sim \mathbb P^{\otimes m} vs ZQmZ \sim \mathbb Q^{\otimes m} from mm samples. Generally, to achieve a small error rate it is necessary and sufficient to have m1/ϵ2m \asymp 1/\epsilon^2, where ϵ\epsilon measures the separation between P\mathbb P and Q\mathbb Q in total variation (TV\mathsf{TV}). Achieving this, however, requires complete knowledge of the distributions P\mathbb P and Q\mathbb Q and can be done, for example, using the Neyman-Pearson test. In this paper we consider a variation of the problem, which we call likelihood-free (or simulation-based) hypothesis testing, where access to P\mathbb P and Q\mathbb Q (which are a priori only known to belong to a large non-parametric family P\mathcal P) is given through nn iid samples from each. We demostrate existence of a fundamental trade-off between nn and mm given by nmnGoF2(ϵ,P)nm \asymp n^2_\mathsf{GoF}(\epsilon,\mathcal P), where nGoFn_\mathsf{GoF} is the minimax sample complexity of testing between the hypotheses H0:P=QH_0: \mathbb P= \mathbb Q vs H1:TV(P,Q)ϵH_1: \mathsf{TV}(\mathbb P,\mathbb Q) \ge \epsilon. We show this for three non-parametric families P\cal P: β\beta-smooth densities over [0,1]d[0,1]^d, the Gaussian sequence model over a Sobolev ellipsoid, and the collection of distributions P\mathcal P on a large alphabet [k][k] with pmfs bounded by c/kc/k for fixed cc. The test that we propose (based on the L2L^2-distance statistic of Ingster) simultaneously achieves all points on the tradeoff curve for these families. In particular, when m1/ϵ2m\gg 1/\epsilon^2 our test requires the number of simulation samples nn to be orders of magnitude smaller than what is needed for density estimation with accuracy ϵ\asymp \epsilon (under TV\mathsf{TV}). This demonstrates the possibility of testing without fully estimating the distributions.Comment: 48 pages, 1 figur
    corecore