7 research outputs found
Mixture decompositions of exponential families using a decomposition of their sample spaces
We study the problem of finding the smallest such that every element of
an exponential family can be written as a mixture of elements of another
exponential family. We propose an approach based on coverings and packings of
the face lattice of the corresponding convex support polytopes and results from
coding theory. We show that is the smallest number for which any
distribution of -ary variables can be written as mixture of
independent -ary variables. Furthermore, we show that any distribution of
binary variables is a mixture of elements
of the -interaction exponential family.Comment: 17 pages, 2 figure
Mixtures and products in two graphical models
We compare two statistical models of three binary random variables. One is a
mixture model and the other is a product of mixtures model called a restricted
Boltzmann machine. Although the two models we study look different from their
parametrizations, we show that they represent the same set of distributions on
the interior of the probability simplex, and are equal up to closure. We give a
semi-algebraic description of the model in terms of six binomial inequalities
and obtain closed form expressions for the maximum likelihood estimates. We
briefly discuss extensions to larger models.Comment: 18 pages, 7 figure
Hierarchical Models as Marginals of Hierarchical Models
We investigate the representation of hierarchical models in terms of
marginals of other hierarchical models with smaller interactions. We focus on
binary variables and marginals of pairwise interaction models whose hidden
variables are conditionally independent given the visible variables. In this
case the problem is equivalent to the representation of linear subspaces of
polynomials by feedforward neural networks with soft-plus computational units.
We show that every hidden variable can freely model multiple interactions among
the visible variables, which allows us to generalize and improve previous
results. In particular, we show that a restricted Boltzmann machine with less
than hidden binary variables can approximate
every distribution of visible binary variables arbitrarily well, compared
to from the best previously known result.Comment: 18 pages, 4 figures, 2 tables, WUPES'1
When Does a Mixture of Products Contain a Product of Mixtures?
We derive relations between theoretical properties of restricted Boltzmann
machines (RBMs), popular machine learning models which form the building blocks
of deep learning models, and several natural notions from discrete mathematics
and convex geometry. We give implications and equivalences relating
RBM-representable probability distributions, perfectly reconstructible inputs,
Hamming modes, zonotopes and zonosets, point configurations in hyperplane
arrangements, linear threshold codes, and multi-covering numbers of hypercubes.
As a motivating application, we prove results on the relative representational
power of mixtures of product distributions and products of mixtures of pairs of
product distributions (RBMs) that formally justify widely held intuitions about
distributed representations. In particular, we show that a mixture of products
requiring an exponentially larger number of parameters is needed to represent
the probability distributions which can be obtained as products of mixtures.Comment: 32 pages, 6 figures, 2 table