686 research outputs found

    Refinements of Universal Approximation Results for Deep Belief Networks and Restricted Boltzmann Machines

    Full text link
    We improve recently published results about resources of Restricted Boltzmann Machines (RBM) and Deep Belief Networks (DBN) required to make them Universal Approximators. We show that any distribution p on the set of binary vectors of length n can be arbitrarily well approximated by an RBM with k-1 hidden units, where k is the minimal number of pairs of binary vectors differing in only one entry such that their union contains the support set of p. In important cases this number is half of the cardinality of the support set of p. We construct a DBN with 2^n/2(n-b), b ~ log(n), hidden layers of width n that is capable of approximating any distribution on {0,1}^n arbitrarily well. This confirms a conjecture presented by Le Roux and Bengio 2010

    Universal Approximation of Markov Kernels by Shallow Stochastic Feedforward Networks

    Full text link
    We establish upper bounds for the minimal number of hidden units for which a binary stochastic feedforward network with sigmoid activation probabilities and a single hidden layer is a universal approximator of Markov kernels. We show that each possible probabilistic assignment of the states of nn output units, given the states of kβ‰₯1k\geq1 input units, can be approximated arbitrarily well by a network with 2kβˆ’1(2nβˆ’1βˆ’1)2^{k-1}(2^{n-1}-1) hidden units.Comment: 13 pages, 3 figure

    Universal Approximation Depth and Errors of Narrow Belief Networks with Discrete Units

    Full text link
    We generalize recent theoretical work on the minimal number of layers of narrow deep belief networks that can approximate any probability distribution on the states of their visible units arbitrarily well. We relax the setting of binary units (Sutskever and Hinton, 2008; Le Roux and Bengio, 2008, 2010; Mont\'ufar and Ay, 2011) to units with arbitrary finite state spaces, and the vanishing approximation error to an arbitrary approximation error tolerance. For example, we show that a qq-ary deep belief network with Lβ‰₯2+q⌈mβˆ’Ξ΄βŒ‰βˆ’1qβˆ’1L\geq 2+\frac{q^{\lceil m-\delta \rceil}-1}{q-1} layers of width n≀m+log⁑q(m)+1n \leq m + \log_q(m) + 1 for some m∈Nm\in \mathbb{N} can approximate any probability distribution on {0,1,…,qβˆ’1}n\{0,1,\ldots,q-1\}^n without exceeding a Kullback-Leibler divergence of Ξ΄\delta. Our analysis covers discrete restricted Boltzmann machines and na\"ive Bayes models as special cases.Comment: 19 pages, 5 figures, 1 tabl

    Universal Approximation with Deep Narrow Networks

    Full text link
    The classical Universal Approximation Theorem holds for neural networks of arbitrary width and bounded depth. Here we consider the natural `dual' scenario for networks of bounded width and arbitrary depth. Precisely, let nn be the number of inputs neurons, mm be the number of output neurons, and let ρ\rho be any nonaffine continuous function, with a continuous nonzero derivative at some point. Then we show that the class of neural networks of arbitrary depth, width n+m+2n + m + 2, and activation function ρ\rho, is dense in C(K;Rm)C(K; \mathbb{R}^m) for KβŠ†RnK \subseteq \mathbb{R}^n with KK compact. This covers every activation function possible to use in practice, and also includes polynomial activation functions, which is unlike the classical version of the theorem, and provides a qualitative difference between deep narrow networks and shallow wide networks. We then consider several extensions of this result. In particular we consider nowhere differentiable activation functions, density in noncompact domains with respect to the LpL^p-norm, and how the width may be reduced to just n+m+1n + m + 1 for `most' activation functions.Comment: Accepted at COLT 202

    Minimum Width of Leaky-ReLU Neural Networks for Uniform Universal Approximation

    Full text link
    The study of universal approximation properties (UAP) for neural networks (NN) has a long history. When the network width is unlimited, only a single hidden layer is sufficient for UAP. In contrast, when the depth is unlimited, the width for UAP needs to be not less than the critical width wminβ‘βˆ—=max⁑(dx,dy)w^*_{\min}=\max(d_x,d_y), where dxd_x and dyd_y are the dimensions of the input and output, respectively. Recently, \cite{cai2022achieve} shows that a leaky-ReLU NN with this critical width can achieve UAP for LpL^p functions on a compact domain KK, \emph{i.e.,} the UAP for Lp(K,Rdy)L^p(K,\mathbb{R}^{d_y}). This paper examines a uniform UAP for the function class C(K,Rdy)C(K,\mathbb{R}^{d_y}) and gives the exact minimum width of the leaky-ReLU NN as wmin⁑=max⁑(dx+1,dy)+1dy=dx+1w_{\min}=\max(d_x+1,d_y)+1_{d_y=d_x+1}, which involves the effects of the output dimensions. To obtain this result, we propose a novel lift-flow-discretization approach that shows that the uniform UAP has a deep connection with topological theory.Comment: ICML2023 camera read

    Geometry and Expressive Power of Conditional Restricted Boltzmann Machines

    Full text link
    Conditional restricted Boltzmann machines are undirected stochastic neural networks with a layer of input and output units connected bipartitely to a layer of hidden units. These networks define models of conditional probability distributions on the states of the output units given the states of the input units, parametrized by interaction weights and biases. We address the representational power of these models, proving results their ability to represent conditional Markov random fields and conditional distributions with restricted supports, the minimal size of universal approximators, the maximal model approximation errors, and on the dimension of the set of representable conditional distributions. We contribute new tools for investigating conditional probability models, which allow us to improve the results that can be derived from existing work on restricted Boltzmann machine probability models.Comment: 30 pages, 5 figures, 1 algorith
    • …
    corecore