6 research outputs found
Universal Approximation Depth and Errors of Narrow Belief Networks with Discrete Units
We generalize recent theoretical work on the minimal number of layers of
narrow deep belief networks that can approximate any probability distribution
on the states of their visible units arbitrarily well. We relax the setting of
binary units (Sutskever and Hinton, 2008; Le Roux and Bengio, 2008, 2010;
Mont\'ufar and Ay, 2011) to units with arbitrary finite state spaces, and the
vanishing approximation error to an arbitrary approximation error tolerance.
For example, we show that a -ary deep belief network with layers of width for some can approximate any probability
distribution on without exceeding a Kullback-Leibler
divergence of . Our analysis covers discrete restricted Boltzmann
machines and na\"ive Bayes models as special cases.Comment: 19 pages, 5 figures, 1 tabl
Scaling of Model Approximation Errors and Expected Entropy Distances
We compute the expected value of the Kullback-Leibler divergence to various
fundamental statistical models with respect to canonical priors on the
probability simplex. We obtain closed formulas for the expected model
approximation errors, depending on the dimension of the models and the
cardinalities of their sample spaces. For the uniform prior, the expected
divergence from any model containing the uniform distribution is bounded by a
constant , and for the models that we consider, this bound is
approached if the state space is very large and the models' dimension does not
grow too fast. For Dirichlet priors the expected divergence is bounded in a
similar way, if the concentration parameters take reasonable values. These
results serve as reference values for more complicated statistical models.Comment: 13 pages, 3 figures, WUPES'1
Maximum information divergence from linear and toric models
We study the problem of maximizing information divergence from a new
perspective using logarithmic Voronoi polytopes. We show that for linear
models, the maximum is always achieved at the boundary of the probability
simplex. For toric models, we present an algorithm that combines the
combinatorics of the chamber complex with numerical algebraic geometry. We pay
special attention to reducible models and models of maximum likelihood degree
one.Comment: 33 pages, 6 figure
Finding the Maximizers of the Information Divergence from an Exponential Family: Finding the Maximizersof the Information Divergencefrom an Exponential Family
The subject of this thesis is the maximization of the information divergence from an exponential family on a finite set, a problem first formulated by Nihat Ay. A special case is the maximization of the mutual information or the multiinformation between different parts of a composite system.
My thesis contributes mainly to the mathematical aspects of the optimization problem. A reformulation is found that relates the maximization of the information divergence with the maximization of an entropic quantity, defined on the normal space of the exponential family. This reformulation simplifies calculations in concrete cases and gives theoretical insight about the general problem.
A second emphasis of the thesis is on examples that demonstrate how the theoretical results can be applied in particular cases. Third, my thesis contain first results on the characterization of exponential families with a small maximum value of the information divergence.:1. Introduction
2. Exponential families
2.1. Exponential families, the convex support and the moment map
2.2. The closure of an exponential family
2.3. Algebraic exponential families
2.4. Hierarchical models
3. Maximizing the information divergence from an exponential family
3.1. The directional derivatives of D(*|E )
3.2. Projection points and kernel distributions
3.3. The function DE
3.4. The first order optimality conditions of DE
3.5. The relation between D(*|E) and DE
3.6. Computing the critical points
3.7. Computing the projection points
4. Examples
4.1. Low-dimensional exponential families
4.1.1. Zero-dimensional exponential families
4.1.2. One-dimensional exponential families
4.1.3. One-dimensional exponential families on four states
4.1.4. Other low-dimensional exponential families
4.2. Partition models
4.3. Exponential families with max D(*|E ) = log(2)
4.4. Binary i.i.d. models and binomial models
5. Applications and Outlook
5.1. Principles of learning, complexity measures and constraints
5.2. Optimally approximating exponential families
5.3. Asymptotic behaviour of the empirical information divergence
A. Polytopes and oriented matroids
A.1. Polytopes
A.2. Oriented matroids
Bibliography
Index
Glossary of notation