The Collective Graphical Model (CGM) models a population of independent and
identically distributed individuals when only collective statistics (i.e.,
counts of individuals) are observed. Exact inference in CGMs is intractable,
and previous work has explored Markov Chain Monte Carlo (MCMC) and MAP
approximations for learning and inference. This paper studies Gaussian
approximations to the CGM. As the population grows large, we show that the CGM
distribution converges to a multivariate Gaussian distribution (GCGM) that
maintains the conditional independence properties of the original CGM. If the
observations are exact marginals of the CGM or marginals that are corrupted by
Gaussian noise, inference in the GCGM approximation can be computed efficiently
in closed form. If the observations follow a different noise model (e.g.,
Poisson), then expectation propagation provides efficient and accurate
approximate inference. The accuracy and speed of GCGM inference is compared to
the MCMC and MAP methods on a simulated bird migration problem. The GCGM
matches or exceeds the accuracy of the MAP method while being significantly
faster.Comment: Accepted by ICML 2014. 10 page version with appendi