Estimating the support size of a distribution is a well-studied problem in
statistics. Motivated by the fact that this problem is highly non-robust (as
small perturbations in the distributions can drastically affect the support
size) and thus hard to estimate, Goldreich [ECCC 2019] studied the query
complexity of estimating the ϵ-\emph{effective support size}
Essϵ of a distribution P, which is equal to the smallest
support size of a distribution that is ϵ-far in total variation
distance from P.
In his paper, he shows an algorithm in the dual access setting (where we may
both receive random samples and query the sampling probability p(x) for any
x) for a bicriteria approximation, giving an answer in
[Ess(1+β)ϵ,(1+γ)Essϵ] for some
values β,γ>0. However, his algorithm has either super-constant
query complexity in the support size or super-constant approximation ratio
1+γ=ω(1). He then asked if this is necessary, or if it is
possible to get a constant-factor approximation in a number of queries
independent of the support size.
We answer his question by showing that not only is complexity independent of
n possible for γ>0, but also for γ=0, that is, that the
bicriteria relaxation is not necessary. Specifically, we show an algorithm with
query complexity O(β3ϵ31). That is, for any 0<ϵ,β<1, we output in this complexity a number n~∈[Ess(1+β)ϵ,Essϵ]. We also show that it is
possible to solve the approximate version with approximation ratio 1+γ
in complexity O(β2ϵ1+βϵγ21). Our algorithm is very simple, and has 4 short lines of
pseudocode