Estimating the Effective Support Size in Constant Query Complexity

Abstract

Estimating the support size of a distribution is a well-studied problem in statistics. Motivated by the fact that this problem is highly non-robust (as small perturbations in the distributions can drastically affect the support size) and thus hard to estimate, Goldreich [ECCC 2019] studied the query complexity of estimating the ϵ\epsilon-\emph{effective support size} Essϵ\text{Ess}_\epsilon of a distribution P{P}, which is equal to the smallest support size of a distribution that is ϵ\epsilon-far in total variation distance from P{P}. In his paper, he shows an algorithm in the dual access setting (where we may both receive random samples and query the sampling probability p(x)p(x) for any xx) for a bicriteria approximation, giving an answer in [Ess(1+β)ϵ,(1+γ)Essϵ][\text{Ess}_{(1+\beta)\epsilon},(1+\gamma) \text{Ess}_{\epsilon}] for some values β,γ>0\beta, \gamma > 0. However, his algorithm has either super-constant query complexity in the support size or super-constant approximation ratio 1+γ=ω(1)1+\gamma = \omega(1). He then asked if this is necessary, or if it is possible to get a constant-factor approximation in a number of queries independent of the support size. We answer his question by showing that not only is complexity independent of nn possible for γ>0\gamma>0, but also for γ=0\gamma=0, that is, that the bicriteria relaxation is not necessary. Specifically, we show an algorithm with query complexity O(1β3ϵ3)O(\frac{1}{\beta^3 \epsilon^3}). That is, for any 0<ϵ,β<10 < \epsilon, \beta < 1, we output in this complexity a number n~[Ess(1+β)ϵ,Essϵ]\tilde{n} \in [\text{Ess}_{(1+\beta)\epsilon},\text{Ess}_\epsilon]. We also show that it is possible to solve the approximate version with approximation ratio 1+γ1+\gamma in complexity O(1β2ϵ+1βϵγ2)O\left(\frac{1}{\beta^2 \epsilon} + \frac{1}{\beta \epsilon \gamma^2}\right). Our algorithm is very simple, and has 44 short lines of pseudocode

    Similar works

    Full text

    thumbnail-image

    Available Versions