Search CORE

6 research outputs found

Universal Approximation Depth and Errors of Narrow Belief Networks with Discrete Units

Author: Montúfar Guido F.
Publication venue
Publication date: 01/01/2014
Field of study

We generalize recent theoretical work on the minimal number of layers of narrow deep belief networks that can approximate any probability distribution on the states of their visible units arbitrarily well. We relax the setting of binary units (Sutskever and Hinton, 2008; Le Roux and Bengio, 2008, 2010; Mont\'ufar and Ay, 2011) to units with arbitrary finite state spaces, and the vanishing approximation error to an arbitrary approximation error tolerance. For example, we show that a

q

-ary deep belief network with

L\geq 2+\frac{q^{\lceil m-\delta \rceil}-1}{q-1}

layers of width

n \leq m + \log_q(m) + 1

for some

m\in \mathbb{N}

can approximate any probability distribution on

\{0,1,\ldots,q-1\}^n

without exceeding a Kullback-Leibler divergence of

\delta

. Our analysis covers discrete restricted Boltzmann machines and na\"ive Bayes models as special cases.Comment: 19 pages, 5 figures, 1 tabl

arXiv.org e-Print Archive

CiteSeerX

Crossref

eScholarship - University of California

Scaling of Model Approximation Errors and Expected Entropy Distances

Author: Montufar Guido F.
Rauh Johannes
Publication venue: 'Institute of Information Theory and Automation'
Publication date: 01/01/2002
Field of study

We compute the expected value of the Kullback-Leibler divergence to various fundamental statistical models with respect to canonical priors on the probability simplex. We obtain closed formulas for the expected model approximation errors, depending on the dimension of the models and the cardinalities of their sample spaces. For the uniform prior, the expected divergence from any model containing the uniform distribution is bounded by a constant

1-\gamma

, and for the models that we consider, this bound is approached if the state space is very large and the models' dimension does not grow too fast. For Dirichlet priors the expected divergence is bounded in a similar way, if the concentration parameters take reasonable values. These results serve as reference values for more complicated statistical models.Comment: 13 pages, 3 figures, WUPES'1

arXiv.org e-Print Archive

CiteSeerX

Crossref

Institute of Mathematics AS CR, v. v. i.

eScholarship - University of California

MPG.PuRe

Maximum information divergence from linear and toric models

Author: Alexandr Yulia
Hoşten Serkan
Publication venue
Publication date: 29/08/2023
Field of study

We study the problem of maximizing information divergence from a new perspective using logarithmic Voronoi polytopes. We show that for linear models, the maximum is always achieved at the boundary of the probability simplex. For toric models, we present an algorithm that combines the combinatorics of the chamber complex with numerical algebraic geometry. We pay special attention to reducible models and models of maximum likelihood degree one.Comment: 33 pages, 6 figure

arXiv.org e-Print Archive

Finding the Maximizers of the Information Divergence from an Exponential Family: Finding the Maximizersof the Information Divergencefrom an Exponential Family

Author: Rauh Johannes
Publication venue
Publication date: 09/01/2011
Field of study

The subject of this thesis is the maximization of the information divergence from an exponential family on a finite set, a problem first formulated by Nihat Ay. A special case is the maximization of the mutual information or the multiinformation between different parts of a composite system. My thesis contributes mainly to the mathematical aspects of the optimization problem. A reformulation is found that relates the maximization of the information divergence with the maximization of an entropic quantity, defined on the normal space of the exponential family. This reformulation simplifies calculations in concrete cases and gives theoretical insight about the general problem. A second emphasis of the thesis is on examples that demonstrate how the theoretical results can be applied in particular cases. Third, my thesis contain first results on the characterization of exponential families with a small maximum value of the information divergence.:1. Introduction 2. Exponential families 2.1. Exponential families, the convex support and the moment map 2.2. The closure of an exponential family 2.3. Algebraic exponential families 2.4. Hierarchical models 3. Maximizing the information divergence from an exponential family 3.1. The directional derivatives of D(*|E ) 3.2. Projection points and kernel distributions 3.3. The function DE 3.4. The first order optimality conditions of DE 3.5. The relation between D(*|E) and DE 3.6. Computing the critical points 3.7. Computing the projection points 4. Examples 4.1. Low-dimensional exponential families 4.1.1. Zero-dimensional exponential families 4.1.2. One-dimensional exponential families 4.1.3. One-dimensional exponential families on four states 4.1.4. Other low-dimensional exponential families 4.2. Partition models 4.3. Exponential families with max D(*|E ) = log(2) 4.4. Binary i.i.d. models and binomial models 5. Applications and Outlook 5.1. Principles of learning, complexity measures and constraints 5.2. Optimally approximating exponential families 5.3. Asymptotic behaviour of the empirical information divergence A. Polytopes and oriented matroids A.1. Polytopes A.2. Oriented matroids Bibliography Index Glossary of notation

HSSS - Hochschulschriftenserver der SLUB