Search CORE

18,687 research outputs found

Near-Optimal Closeness Testing of Discrete Histogram Distributions

Author: Diakonikolas Ilias
Kane Daniel M.
Nikishkin Vladimir
Publication venue
Publication date: 01/01/2017
Field of study

We investigate the problem of testing the equivalence between two discrete histograms. A {\em

k

-histogram} over

[n]

is a probability distribution that is piecewise constant over some set of

k

intervals over

[n]

. Histograms have been extensively studied in computer science and statistics. Given a set of samples from two

k

-histogram distributions

p, q

over

[n]

, we want to distinguish (with high probability) between the cases that

p = q

and

\|p-q\|_1 \geq \epsilon

. The main contribution of this paper is a new algorithm for this testing problem and a nearly matching information-theoretic lower bound. Specifically, the sample complexity of our algorithm matches our lower bound up to a logarithmic factor, improving on previous work by polynomial factors in the relevant parameters. Our algorithmic approach applies in a more general setting and yields improved sample upper bounds for testing closeness of other structured distributions as well

arXiv.org e-Print Archive

Edinburgh Research Explorer

Dagstuhl Research Online Publication Server

Optimal Algorithms for Testing Closeness of Discrete Distributions

Author: Chan Siu-On
Diakonikolas Ilias
Valiant Gregory
Valiant Paul
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2013
Field of study

We study the question of closeness testing for two discrete distributions. More precisely, given samples from two distributions

p

and

q

over an

n

-element set, we wish to distinguish whether

p=q

versus

p

is at least \eps-far from

q

, in either

\ell_1

\ell_2

distance. Batu et al. gave the first sub-linear time algorithms for these problems, which matched the lower bounds of Valiant up to a logarithmic factor in

n

, and a polynomial factor of \eps. In this work, we present simple (and new) testers for both the

\ell_1

and

\ell_2

settings, with sample complexity that is information-theoretically optimal, to constant factors, both in the dependence on

n

, and the dependence on \eps; for the

\ell_1

testing problem we establish that the sample complexity is $\Theta(\max\{n^{2/3}/\eps^{4/3}, n^{1/2}/\eps^2 \}).

arXiv.org e-Print Archive

CiteSeerX

Crossref

Edinburgh Research Explorer

Optimal Testing of Discrete Distributions with High Probability

Author: Diakonikolas Ilias
Gouleakis Themis
Kane Daniel M.
Peebles John
Price Eric
Publication venue
Publication date: 01/01/2020
Field of study

We study the problem of testing discrete distributions with a focus on the high probability regime. Specifically, given samples from one or more discrete distributions, a property

\mathcal{P}

, and parameters

0< \epsilon, \delta <1

, we want to distinguish {\em with probability at least

1-\delta

} whether these distributions satisfy

\mathcal{P}

or are

\epsilon

-far from

\mathcal{P}

in total variation distance. Most prior work in distribution testing studied the constant confidence case (corresponding to

\delta = \Omega(1)

), and provided sample-optimal testers for a range of properties. While one can always boost the confidence probability of any such tester by black-box amplification, this generic boosting method typically leads to sub-optimal sample bounds. Here we study the following broad question: For a given property

\mathcal{P}

, can we {\em characterize} the sample complexity of testing

\mathcal{P}

as a function of all relevant problem parameters, including the error probability

\delta

? Prior to this work, uniformity testing was the only statistical task whose sample complexity had been characterized in this setting. As our main results, we provide the first algorithms for closeness and independence testing that are sample-optimal, within constant factors, as a function of all relevant parameters. We also show matching information-theoretic lower bounds on the sample complexity of these problems. Our techniques naturally extend to give optimal testers for related problems. To illustrate the generality of our methods, we give optimal algorithms for testing collections of distributions and testing closeness with unequal sized samples

arXiv.org e-Print Archive

eScholarship - University of California

MPG.PuRe

Optimal Testing of Discrete Distributions with High Probability

Author: Diakonikolas I.
Gouleakis T.
Kane D.
Peebles J.
Price E.
Publication venue
Publication date: 01/01/2020
Field of study

We study the problem of testing discrete distributions with a focus on the high probability regime. Specifically, given samples from one or more discrete distributions, a property

\mathcal{P}

, and parameters

0< \epsilon, \delta <1

, we want to distinguish {\em with probability at least

1-\delta

} whether these distributions satisfy

\mathcal{P}

or are

\epsilon

-far from

\mathcal{P}

in total variation distance. Most prior work in distribution testing studied the constant confidence case (corresponding to

\delta = \Omega(1)

\mathcal{P}

, can we {\em characterize} the sample complexity of testing

\mathcal{P}

as a function of all relevant problem parameters, including the error probability

\delta

MPG.PuRe

Two Party Distribution Testing: Communication and Security

Author: Andoni Alexandr
Malkin Tal
Nosatzki Negev Shekel
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019)
Publication date: 09/11/2018
Field of study

We study the problem of discrete distribution testing in the two-party setting. For example, in the standard closeness testing problem, Alice and Bob each have t samples from, respectively, distributions a and b over [n], and they need to test whether a=b or a,b are epsilon-far (in the l_1 distance). This is in contrast to the well-studied one-party case, where the tester has unrestricted access to samples of both distributions. Despite being a natural constraint in applications, the two-party setting has previously evaded attention. We address two fundamental aspects of the two-party setting: 1) what is the communication complexity, and 2) can it be accomplished securely, without Alice and Bob learning extra information about each other\u27s input. Besides closeness testing, we also study the independence testing problem, where Alice and Bob have t samples from distributions a and b respectively, which may be correlated; the question is whether a,b are independent or epsilon-far from being independent. Our contribution is three-fold: 1) We show how to gain communication efficiency given more samples, beyond the information-theoretic bound on t. The gain is polynomially better than what one would obtain via adapting one-party algorithms. 2) We prove tightness of our trade-off for the closeness testing, as well as that the independence testing requires tight Omega(sqrt{m}) communication for unbounded number of samples. These lower bounds are of independent interest as, to the best of our knowledge, these are the first 2-party communication lower bounds for testing problems, where the inputs are a set of i.i.d. samples. 3) We define the concept of secure distribution testing, and provide secure versions of the above protocols with an overhead that is only polynomial in the security parameter

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Cryptology ePrint Archive

Distributional Property Testing in a Quantum World

Author: Li Tongyang
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 11th Innovations in Theoretical Computer Science Conference (ITCS 2020)
Publication date: 01/01/2020
Field of study

A fundamental problem in statistics and learning theory is to test properties of distributions. We show that quantum computers can solve such problems with significant speed-ups. We also introduce a novel access model for quantum distributions, enabling the coherent preparation of quantum samples, and propose a general framework that can naturally handle both classical and quantum distributions in a unified manner. Our framework generalizes and improves previous quantum algorithms for testing closeness between unknown distributions, testing independence between two distributions, and estimating the Shannon / von Neumann entropy of distributions. For classical distributions our algorithms significantly improve the precision dependence of some earlier results. We also show that in our framework procedures for classical distributions can be directly lifted to the more general case of quantum distributions, and thus obtain the first speed-ups for testing properties of density operators that can be accessed coherently rather than only via sampling

Dagstuhl Research Online Publication Server