Testing closeness of discrete distributions

Abstract

Given samples from two distributions over an nn-element set, we wish to test whether these distributions are statistically close. We present an algorithm which uses sublinear in nn, specifically, O(n2/3ϵ8/3logn)O(n^{2/3}\epsilon^{-8/3}\log n), independent samples from each distribution, runs in time linear in the sample size, makes no assumptions about the structure of the distributions, and distinguishes the cases when the distance between the distributions is small (less than max{ϵ4/3n1/3/32,ϵn1/2/4}\max\{\epsilon^{4/3}n^{-1/3}/32, \epsilon n^{-1/2}/4\}) or large (more than ϵ\epsilon) in 1\ell_1 distance. This result can be compared to the lower bound of Ω(n2/3ϵ2/3)\Omega(n^{2/3}\epsilon^{-2/3}) for this problem given by Valiant. Our algorithm has applications to the problem of testing whether a given Markov process is rapidly mixing. We present sublinear for several variants of this problem as well. A preliminary version of this paper appeared in the 41st Symposium on Foundations of Computer Science, 2000, Redondo Beach, C

Similar works

Full text

thumbnail-image

LSE Research Online

redirect
Last time updated on 10/02/2012

This paper was published in LSE Research Online.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.