Search CORE

1,042 research outputs found

Universal Approximation Depth and Errors of Narrow Belief Networks with Discrete Units

Author: Montúfar Guido F.
Publication venue
Publication date: 01/01/2014
Field of study

We generalize recent theoretical work on the minimal number of layers of narrow deep belief networks that can approximate any probability distribution on the states of their visible units arbitrarily well. We relax the setting of binary units (Sutskever and Hinton, 2008; Le Roux and Bengio, 2008, 2010; Mont\'ufar and Ay, 2011) to units with arbitrary finite state spaces, and the vanishing approximation error to an arbitrary approximation error tolerance. For example, we show that a

q

-ary deep belief network with

L\geq 2+\frac{q^{\lceil m-\delta \rceil}-1}{q-1}

layers of width

n \leq m + \log_q(m) + 1

for some

m\in \mathbb{N}

can approximate any probability distribution on

\{0,1,\ldots,q-1\}^n

without exceeding a Kullback-Leibler divergence of

\delta

. Our analysis covers discrete restricted Boltzmann machines and na\"ive Bayes models as special cases.Comment: 19 pages, 5 figures, 1 tabl

arXiv.org e-Print Archive

CiteSeerX

eScholarship - University of California

Refinements of Universal Approximation Results for Deep Belief Networks and Restricted Boltzmann Machines

Author: Ay Nihat
Montufar Guido
Publication venue
Publication date: 26/07/2010
Field of study

We improve recently published results about resources of Restricted Boltzmann Machines (RBM) and Deep Belief Networks (DBN) required to make them Universal Approximators. We show that any distribution p on the set of binary vectors of length n can be arbitrarily well approximated by an RBM with k-1 hidden units, where k is the minimal number of pairs of binary vectors differing in only one entry such that their union contains the support set of p. In important cases this number is half of the cardinality of the support set of p. We construct a DBN with 2^n/2(n-b), b ~ log(n), hidden layers of width n that is capable of approximating any distribution on {0,1}^n arbitrarily well. This confirms a conjecture presented by Le Roux and Bengio 2010

arXiv.org e-Print Archive

From features to speaker vectors by means of restricted Boltzmann machine adaptation

Author: Ghahabi Esfahani Omid
Hernando Pericás Francisco Javier
Safari Pooyan
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2016
Field of study

Restricted Boltzmann Machines (RBMs) have shown success in different stages of speaker recognition systems. In this paper, we propose a novel framework to produce a vector-based representation for each speaker, which will be referred to as RBM-vector. This new approach maps the speaker spectral features to a single fixed-dimensional vector carrying speaker-specific information. In this work, a global model, referred to as Universal RBM (URBM), is trained taking advantage of RBM unsupervised learning capabilities. Then, this URBM is adapted to the data of each speaker in the development, enrolment and evaluation datasets. The network connection weights of the adapted RBMs are further concatenated and subject to a whitening with dimension reduction stage to build the speaker vectors. The evaluation is performed on the core test condition of the NIST SRE 2006 database, and it is shown that RBM-vectors achieve 15% relative improvement in terms of EER compared to i-vectors using cosine scoring. The score fusion with i-vector attains more than 24% relative improvement. The interest of this result for score fusion yields on the fact that both vectors are produced in an unsupervised fashion and can be used instead of i-vector/PLDA approach, when no data label is available. Results obtained for RBM-vector/PLDA framework is comparable with the ones from i-vector/PLDA. Their score fusion achieves 14% relative improvement compared to i-vector/PLDA.Peer ReviewedPostprint (published version

Crossref

UPCommons. Portal del coneixement obert de la UPC

Geometry and Expressive Power of Conditional Restricted Boltzmann Machines

Author: Ay Nihat
Ghazi-Zahedi Keyan
Montufar Guido
Publication venue
Publication date: 01/01/2015
Field of study

Conditional restricted Boltzmann machines are undirected stochastic neural networks with a layer of input and output units connected bipartitely to a layer of hidden units. These networks define models of conditional probability distributions on the states of the output units given the states of the input units, parametrized by interaction weights and biases. We address the representational power of these models, proving results their ability to represent conditional Markov random fields and conditional distributions with restricted supports, the minimal size of universal approximators, the maximal model approximation errors, and on the dimension of the set of representable conditional distributions. We contribute new tools for investigating conditional probability models, which allow us to improve the results that can be derived from existing work on restricted Boltzmann machine probability models.Comment: 30 pages, 5 figures, 1 algorith

arXiv.org e-Print Archive

eScholarship - University of California

Deep learning systems as complex networks

Author: Piccolini Michele
Suweis Samir
Testolin Alberto
Publication venue
Publication date: 28/09/2018
Field of study

Thanks to the availability of large scale digital datasets and massive amounts of computational power, deep learning algorithms can learn representations of data by exploiting multiple levels of abstraction. These machine learning methods have greatly improved the state-of-the-art in many challenging cognitive tasks, such as visual object recognition, speech processing, natural language understanding and automatic translation. In particular, one class of deep learning models, known as deep belief networks, can discover intricate statistical structure in large data sets in a completely unsupervised fashion, by learning a generative model of the data using Hebbian-like learning mechanisms. Although these self-organizing systems can be conveniently formalized within the framework of statistical mechanics, their internal functioning remains opaque, because their emergent dynamics cannot be solved analytically. In this article we propose to study deep belief networks using techniques commonly employed in the study of complex networks, in order to gain some insights into the structural and functional properties of the computational graph resulting from the learning process.Comment: 20 pages, 9 figure

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Padova