Near-Optimal Density Estimation in Near-Linear Time Using Variable-Width Histograms

Abstract

Let pp be an unknown and arbitrary probability distribution over [0,1)[0,1). We consider the problem of {\em density estimation}, in which a learning algorithm is given i.i.d. draws from pp and must (with high probability) output a hypothesis distribution that is close to pp. The main contribution of this paper is a highly efficient density estimation algorithm for learning using a variable-width histogram, i.e., a hypothesis distribution with a piecewise constant probability density function. In more detail, for any kk and ϵ\epsilon, we give an algorithm that makes O~(k/ϵ2)\tilde{O}(k/\epsilon^2) draws from pp, runs in O~(k/ϵ2)\tilde{O}(k/\epsilon^2) time, and outputs a hypothesis distribution hh that is piecewise constant with O(klog2(1/ϵ))O(k \log^2(1/\epsilon)) pieces. With high probability the hypothesis hh satisfies dTV(p,h)Coptk(p)+ϵd_{\mathrm{TV}}(p,h) \leq C \cdot \mathrm{opt}_k(p) + \epsilon, where dTVd_{\mathrm{TV}} denotes the total variation distance (statistical distance), CC is a universal constant, and optk(p)\mathrm{opt}_k(p) is the smallest total variation distance between pp and any kk-piecewise constant distribution. The sample size and running time of our algorithm are optimal up to logarithmic factors. The "approximation factor" CC in our result is inherent in the problem, as we prove that no algorithm with sample size bounded in terms of kk and ϵ\epsilon can achieve C<2C<2 regardless of what kind of hypothesis distribution it uses.Comment: conference version appears in NIPS 201

    Similar works