We introduce marginalization models (MaMs), a new family of generative models
for high-dimensional discrete data. They offer scalable and flexible generative
modeling with tractable likelihoods by explicitly modeling all induced marginal
distributions. Marginalization models enable fast evaluation of arbitrary
marginal probabilities with a single forward pass of the neural network, which
overcomes a major limitation of methods with exact marginal inference, such as
autoregressive models (ARMs). We propose scalable methods for learning the
marginals, grounded in the concept of "marginalization self-consistency".
Unlike previous methods, MaMs support scalable training of any-order generative
models for high-dimensional problems under the setting of energy-based
training, where the goal is to match the learned distribution to a given
desired probability (specified by an unnormalized (log) probability function
such as energy function or reward function). We demonstrate the effectiveness
of the proposed model on a variety of discrete data distributions, including
binary images, language, physical systems, and molecules, for maximum
likelihood and energy-based training settings. MaMs achieve orders of magnitude
speedup in evaluating the marginal probabilities on both settings. For
energy-based training tasks, MaMs enable any-order generative modeling of
high-dimensional problems beyond the capability of previous methods. Code is at
https://github.com/PrincetonLIPS/MaM