research

On the Structure, Covering, and Learning of Poisson Multinomial Distributions

Abstract

An (n,k)(n,k)-Poisson Multinomial Distribution (PMD) is the distribution of the sum of nn independent random vectors supported on the set Bk={e1,,ek}{\cal B}_k=\{e_1,\ldots,e_k\} of standard basis vectors in Rk\mathbb{R}^k. We prove a structural characterization of these distributions, showing that, for all ε>0\varepsilon >0, any (n,k)(n, k)-Poisson multinomial random vector is ε\varepsilon-close, in total variation distance, to the sum of a discretized multidimensional Gaussian and an independent (poly(k/ε),k)(\text{poly}(k/\varepsilon), k)-Poisson multinomial random vector. Our structural characterization extends the multi-dimensional CLT of Valiant and Valiant, by simultaneously applying to all approximation requirements ε\varepsilon. In particular, it overcomes factors depending on logn\log n and, importantly, the minimum eigenvalue of the PMD's covariance matrix from the distance to a multidimensional Gaussian random variable. We use our structural characterization to obtain an ε\varepsilon-cover, in total variation distance, of the set of all (n,k)(n, k)-PMDs, significantly improving the cover size of Daskalakis and Papadimitriou, and obtaining the same qualitative dependence of the cover size on nn and ε\varepsilon as the k=2k=2 cover of Daskalakis and Papadimitriou. We further exploit this structure to show that (n,k)(n,k)-PMDs can be learned to within ε\varepsilon in total variation distance from O~k(1/ε2)\tilde{O}_k(1/\varepsilon^2) samples, which is near-optimal in terms of dependence on ε\varepsilon and independent of nn. In particular, our result generalizes the single-dimensional result of Daskalakis, Diakonikolas, and Servedio for Poisson Binomials to arbitrary dimension.Comment: 49 pages, extended abstract appeared in FOCS 201

    Similar works

    Full text

    thumbnail-image

    Available Versions