16 research outputs found
Natural evolution strategies and variational Monte Carlo
A notion of quantum natural evolution strategies is introduced, which
provides a geometric synthesis of a number of known quantum/classical
algorithms for performing classical black-box optimization. Recent work of
Gomes et al. [2019] on heuristic combinatorial optimization using neural
quantum states is pedagogically reviewed in this context, emphasizing the
connection with natural evolution strategies. The algorithmic framework is
illustrated for approximate combinatorial optimization problems, and a
systematic strategy is found for improving the approximation ratios. In
particular it is found that natural evolution strategies can achieve
approximation ratios competitive with widely used heuristic algorithms for
Max-Cut, at the expense of increased computation time
Mixtures and products in two graphical models
We compare two statistical models of three binary random variables. One is a
mixture model and the other is a product of mixtures model called a restricted
Boltzmann machine. Although the two models we study look different from their
parametrizations, we show that they represent the same set of distributions on
the interior of the probability simplex, and are equal up to closure. We give a
semi-algebraic description of the model in terms of six binomial inequalities
and obtain closed form expressions for the maximum likelihood estimates. We
briefly discuss extensions to larger models.Comment: 18 pages, 7 figure
Universal Approximation Depth and Errors of Narrow Belief Networks with Discrete Units
We generalize recent theoretical work on the minimal number of layers of
narrow deep belief networks that can approximate any probability distribution
on the states of their visible units arbitrarily well. We relax the setting of
binary units (Sutskever and Hinton, 2008; Le Roux and Bengio, 2008, 2010;
Mont\'ufar and Ay, 2011) to units with arbitrary finite state spaces, and the
vanishing approximation error to an arbitrary approximation error tolerance.
For example, we show that a -ary deep belief network with layers of width for some can approximate any probability
distribution on without exceeding a Kullback-Leibler
divergence of . Our analysis covers discrete restricted Boltzmann
machines and na\"ive Bayes models as special cases.Comment: 19 pages, 5 figures, 1 tabl
Scaling of Model Approximation Errors and Expected Entropy Distances
We compute the expected value of the Kullback-Leibler divergence to various
fundamental statistical models with respect to canonical priors on the
probability simplex. We obtain closed formulas for the expected model
approximation errors, depending on the dimension of the models and the
cardinalities of their sample spaces. For the uniform prior, the expected
divergence from any model containing the uniform distribution is bounded by a
constant , and for the models that we consider, this bound is
approached if the state space is very large and the models' dimension does not
grow too fast. For Dirichlet priors the expected divergence is bounded in a
similar way, if the concentration parameters take reasonable values. These
results serve as reference values for more complicated statistical models.Comment: 13 pages, 3 figures, WUPES'1
Mixture decompositions of exponential families using a decomposition of their sample spaces
We study the problem of finding the smallest such that every element of
an exponential family can be written as a mixture of elements of another
exponential family. We propose an approach based on coverings and packings of
the face lattice of the corresponding convex support polytopes and results from
coding theory. We show that is the smallest number for which any
distribution of -ary variables can be written as mixture of
independent -ary variables. Furthermore, we show that any distribution of
binary variables is a mixture of elements
of the -interaction exponential family.Comment: 17 pages, 2 figure
When Does a Mixture of Products Contain a Product of Mixtures?
We derive relations between theoretical properties of restricted Boltzmann
machines (RBMs), popular machine learning models which form the building blocks
of deep learning models, and several natural notions from discrete mathematics
and convex geometry. We give implications and equivalences relating
RBM-representable probability distributions, perfectly reconstructible inputs,
Hamming modes, zonotopes and zonosets, point configurations in hyperplane
arrangements, linear threshold codes, and multi-covering numbers of hypercubes.
As a motivating application, we prove results on the relative representational
power of mixtures of product distributions and products of mixtures of pairs of
product distributions (RBMs) that formally justify widely held intuitions about
distributed representations. In particular, we show that a mixture of products
requiring an exponentially larger number of parameters is needed to represent
the probability distributions which can be obtained as products of mixtures.Comment: 32 pages, 6 figures, 2 table