9,715 research outputs found

    Optimal Algorithms for Testing Closeness of Discrete Distributions

    Get PDF
    We study the question of closeness testing for two discrete distributions. More precisely, given samples from two distributions pp and qq over an nn-element set, we wish to distinguish whether p=qp=q versus pp is at least \eps-far from qq, in either â„“1\ell_1 or â„“2\ell_2 distance. Batu et al. gave the first sub-linear time algorithms for these problems, which matched the lower bounds of Valiant up to a logarithmic factor in nn, and a polynomial factor of \eps. In this work, we present simple (and new) testers for both the â„“1\ell_1 and â„“2\ell_2 settings, with sample complexity that is information-theoretically optimal, to constant factors, both in the dependence on nn, and the dependence on \eps; for the â„“1\ell_1 testing problem we establish that the sample complexity is $\Theta(\max\{n^{2/3}/\eps^{4/3}, n^{1/2}/\eps^2 \}).

    Optimal Testing of Discrete Distributions with High Probability

    Get PDF
    We study the problem of testing discrete distributions with a focus on the high probability regime. Specifically, given samples from one or more discrete distributions, a property P\mathcal{P}, and parameters 0<ϵ,δ<10< \epsilon, \delta <1, we want to distinguish {\em with probability at least 1−δ1-\delta} whether these distributions satisfy P\mathcal{P} or are ϵ\epsilon-far from P\mathcal{P} in total variation distance. Most prior work in distribution testing studied the constant confidence case (corresponding to δ=Ω(1)\delta = \Omega(1)), and provided sample-optimal testers for a range of properties. While one can always boost the confidence probability of any such tester by black-box amplification, this generic boosting method typically leads to sub-optimal sample bounds. Here we study the following broad question: For a given property P\mathcal{P}, can we {\em characterize} the sample complexity of testing P\mathcal{P} as a function of all relevant problem parameters, including the error probability δ\delta? Prior to this work, uniformity testing was the only statistical task whose sample complexity had been characterized in this setting. As our main results, we provide the first algorithms for closeness and independence testing that are sample-optimal, within constant factors, as a function of all relevant parameters. We also show matching information-theoretic lower bounds on the sample complexity of these problems. Our techniques naturally extend to give optimal testers for related problems. To illustrate the generality of our methods, we give optimal algorithms for testing collections of distributions and testing closeness with unequal sized samples

    Optimal Testing of Discrete Distributions with High Probability

    Get PDF
    We study the problem of testing discrete distributions with a focus on the high probability regime. Specifically, given samples from one or more discrete distributions, a property P\mathcal{P}, and parameters 0<ϵ,δ<10< \epsilon, \delta <1, we want to distinguish {\em with probability at least 1−δ1-\delta} whether these distributions satisfy P\mathcal{P} or are ϵ\epsilon-far from P\mathcal{P} in total variation distance. Most prior work in distribution testing studied the constant confidence case (corresponding to δ=Ω(1)\delta = \Omega(1)), and provided sample-optimal testers for a range of properties. While one can always boost the confidence probability of any such tester by black-box amplification, this generic boosting method typically leads to sub-optimal sample bounds. Here we study the following broad question: For a given property P\mathcal{P}, can we {\em characterize} the sample complexity of testing P\mathcal{P} as a function of all relevant problem parameters, including the error probability δ\delta? Prior to this work, uniformity testing was the only statistical task whose sample complexity had been characterized in this setting. As our main results, we provide the first algorithms for closeness and independence testing that are sample-optimal, within constant factors, as a function of all relevant parameters. We also show matching information-theoretic lower bounds on the sample complexity of these problems. Our techniques naturally extend to give optimal testers for related problems. To illustrate the generality of our methods, we give optimal algorithms for testing collections of distributions and testing closeness with unequal sized samples

    Distributional Property Testing in a Quantum World

    Get PDF
    A fundamental problem in statistics and learning theory is to test properties of distributions. We show that quantum computers can solve such problems with significant speed-ups. We also introduce a novel access model for quantum distributions, enabling the coherent preparation of quantum samples, and propose a general framework that can naturally handle both classical and quantum distributions in a unified manner. Our framework generalizes and improves previous quantum algorithms for testing closeness between unknown distributions, testing independence between two distributions, and estimating the Shannon / von Neumann entropy of distributions. For classical distributions our algorithms significantly improve the precision dependence of some earlier results. We also show that in our framework procedures for classical distributions can be directly lifted to the more general case of quantum distributions, and thus obtain the first speed-ups for testing properties of density operators that can be accessed coherently rather than only via sampling

    Testing probability distributions underlying aggregated data

    Full text link
    In this paper, we analyze and study a hybrid model for testing and learning probability distributions. Here, in addition to samples, the testing algorithm is provided with one of two different types of oracles to the unknown distribution DD over [n][n]. More precisely, we define both the dual and cumulative dual access models, in which the algorithm AA can both sample from DD and respectively, for any i∈[n]i\in[n], - query the probability mass D(i)D(i) (query access); or - get the total mass of {1,…,i}\{1,\dots,i\}, i.e. ∑j=1iD(j)\sum_{j=1}^i D(j) (cumulative access) These two models, by generalizing the previously studied sampling and query oracle models, allow us to bypass the strong lower bounds established for a number of problems in these settings, while capturing several interesting aspects of these problems -- and providing new insight on the limitations of the models. Finally, we show that while the testing algorithms can be in most cases strictly more efficient, some tasks remain hard even with this additional power

    Sharp Bounds for Generalized Uniformity Testing

    Full text link
    We study the problem of generalized uniformity testing \cite{BC17} of a discrete probability distribution: Given samples from a probability distribution pp over an {\em unknown} discrete domain Ω\mathbf{\Omega}, we want to distinguish, with probability at least 2/32/3, between the case that pp is uniform on some {\em subset} of Ω\mathbf{\Omega} versus ϵ\epsilon-far, in total variation distance, from any such uniform distribution. We establish tight bounds on the sample complexity of generalized uniformity testing. In more detail, we present a computationally efficient tester whose sample complexity is optimal, up to constant factors, and a matching information-theoretic lower bound. Specifically, we show that the sample complexity of generalized uniformity testing is Θ(1/(ϵ4/3∥p∥3)+1/(ϵ2∥p∥2))\Theta\left(1/(\epsilon^{4/3}\|p\|_3) + 1/(\epsilon^{2} \|p\|_2) \right)
    • …
    corecore