Search CORE

282 research outputs found

On the Equivalence Between Deep NADE and Generative Stochastic Networks

Author: G.E. Hinton
G.E. Hinton
R.M. Neal
Y. Bengio
Y. LeCun
Publication venue
Publication date: 01/01/2014
Field of study

Neural Autoregressive Distribution Estimators (NADEs) have recently been shown as successful alternatives for modeling high dimensional multimodal distributions. One issue associated with NADEs is that they rely on a particular order of factorization for

P(\mathbf{x})

. This issue has been recently addressed by a variant of NADE called Orderless NADEs and its deeper version, Deep Orderless NADE. Orderless NADEs are trained based on a criterion that stochastically maximizes

P(\mathbf{x})

with all possible orders of factorizations. Unfortunately, ancestral sampling from deep NADE is very expensive, corresponding to running through a neural net separately predicting each of the visible variables given some others. This work makes a connection between this criterion and the training criterion for Generative Stochastic Networks (GSNs). It shows that training NADEs in this way also trains a GSN, which defines a Markov chain associated with the NADE model. Based on this connection, we show an alternative way to sample from a trained Orderless NADE that allows to trade-off computing time and quality of the samples: a 3 to 10-fold speedup (taking into account the waste due to correlations between consecutive samples of the chain) can be obtained without noticeably reducing the quality of the samples. This is achieved using a novel sampling procedure for GSNs called annealed GSN sampling, similar to tempering methods that combines fast mixing (obtained thanks to steps at high noise levels) with accurate samples (obtained thanks to steps at low noise levels).Comment: ECML/PKDD 201

arXiv.org e-Print Archive

Crossref

Contractive De-noising Auto-encoder

Author: C.C. Chang
C.J. Burges
D.E. Rumelhart
G.E. Hinton
G.E. Hinton
H. Bourlard
P. Vincent
Publication venue
Publication date: 01/01/2014
Field of study

Auto-encoder is a special kind of neural network based on reconstruction. De-noising auto-encoder (DAE) is an improved auto-encoder which is robust to the input by corrupting the original data first and then reconstructing the original input by minimizing the reconstruction error function. And contractive auto-encoder (CAE) is another kind of improved auto-encoder to learn robust feature by introducing the Frobenius norm of the Jacobean matrix of the learned feature with respect to the original input. In this paper, we combine de-noising auto-encoder and contractive auto- encoder, and propose another improved auto-encoder, contractive de-noising auto- encoder (CDAE), which is robust to both the original input and the learned feature. We stack CDAE to extract more abstract features and apply SVM for classification. The experiment result on benchmark dataset MNIST shows that our proposed CDAE performed better than both DAE and CAE, proving the effective of our method.Comment: Figures edite

arXiv.org e-Print Archive

Crossref

Recommended from our members

Low-cost representation for restricted Boltzmann machines

Author: G.E. Hinton
G.E. Hinton
H. Larochelle
P. Smolensky
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

This paper presents a method for extracting a low-cost representation from restricted Boltzmann machines. The new representation can be considered as a compression of the network, requiring much less storage capacity while reasonably preserving the network's performance at feature learning. We show that the compression can be done by converting the weight matrix of real numbers into a matrix of three values {-1, 0, 1} associated with a score vector of real numbers. This set of values is similar enough to Boolean values which help us further translate the representation into logical rules. In the experiments reported in this paper, we evaluate the performance of our compression method on image datasets, obtaining promising results. Experiments on the MNIST handwritten digit classification dataset, for example, have shown that a 95% saving in memory can be achieved with no significant drop in accuracy

City Research Online

Crossref

Energy-based temporal neural networks for imputing missing values

Author: G.E. Hinton
G.E. Hinton
G.W. Taylor
H. Lee
I. Sutskever
J. Besag
J. Domke
J. Ngiam
P. Mirowski
P. Vincent
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2012
Field of study

Imputing missing values in high dimensional time series is a difficult problem. There have been some approaches to the problem [11,8] where neural architectures were trained as probabilistic models of the data. However, we argue that this approach is not optimal. We propose to view temporal neural networks with latent variables as energy-based models and train them for missing value recovery directly. In this paper we introduce two energy-based models. The first model is based on a one dimensional convolution and the second model utilizes a recurrent neural network. We demonstrate how ideas from the energy-based learning framework can be used to train these models to recover missing values. The models are evaluated on a motion capture dataset

Crossref

Ghent University Academic Bibliography

Pushing Stochastic Gradient towards Second-Order Methods -- Backpropagation Learning with Transformations in Nonlinearities

Author: G.E. Hinton
N.N. Schraudolph
S. Amari
T. Raiko
Y.A. LeCun
Publication venue
Publication date: 01/01/2013
Field of study

Recently, we proposed to transform the outputs of each hidden neuron in a multi-layer perceptron network to have zero output and zero slope on average, and use separate shortcut connections to model the linear dependencies instead. We continue the work by firstly introducing a third transformation to normalize the scale of the outputs of each hidden neuron, and secondly by analyzing the connections to second order optimization methods. We show that the transformations make a simple stochastic gradient behave closer to second-order optimization methods and thus speed up learning. This is shown both in theory and with experiments. The experiments on the third transformation show that while it further increases the speed of learning, it can also hurt performance by converging to a worse local optimum, where both the inputs and outputs of many hidden neurons are close to zero.Comment: 10 pages, 5 figures, ICLR201

arXiv.org e-Print Archive

Crossref

Inducing Language Networks from Continuous Space Word Representations

Author: G.E. Hinton
K. Beyer
M. Sigman
R. Collobert
R.F.I. Cancho
Y. Bengio
Publication venue
Publication date: 01/01/2014
Field of study

Recent advancements in unsupervised feature learning have developed powerful latent representations of words. However, it is still not clear what makes one representation better than another and how we can learn the ideal representation. Understanding the structure of latent spaces attained is key to any future advancement in unsupervised learning. In this work, we introduce a new view of continuous space word representations as language networks. We explore two techniques to create language networks from learned features by inducing them for two popular word representation methods and examining the properties of their resulting networks. We find that the induced networks differ from other methods of creating language networks, and that they contain meaningful community structure.Comment: 14 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

Comparing Probabilistic Models for Melodic Sequences

Author: D. Eck
D. Ron
D.H. Ackley
F. Lerdahl
F. Wood
G.E. Hinton
G.E. Hinton
G.W. Taylor
G.W. Taylor
H. Lee
H. Lee
I. Sutskever
M. Norouzi
S. Dubnov
V. Lavrenko
Publication venue
Publication date: 01/01/2011
Field of study

Modelling the real world complexity of music is a challenge for machine learning. We address the task of modeling melodic sequences from the same music genre. We perform a comparative analysis of two probabilistic models; a Dirichlet Variable Length Markov Model (Dirichlet-VMM) and a Time Convolutional Restricted Boltzmann Machine (TC-RBM). We show that the TC-RBM learns descriptive music features, such as underlying chords and typical melody transitions and dynamics. We assess the models for future prediction and compare their performance to a VMM, which is the current state of the art in melody generation. We show that both models perform significantly better than the VMM, with the Dirichlet-VMM marginally outperforming the TC-RBM. Finally, we evaluate the short order statistics of the models, using the Kullback-Leibler divergence between test sequences and model samples, and show that our proposed methods match the statistics of the music genre significantly better than the VMM.Comment: in Proceedings of the ECML-PKDD 2011. Lecture Notes in Computer Science, vol. 6913, pp. 289-304. Springer (2011

arXiv.org e-Print Archive

CiteSeerX

Crossref

Edinburgh Research Explorer

A Neural-Astrocytic Network Architecture: Astrocytic calcium waves modulate synchronous neuronal activity

Author: Fields R Douglas
Hinton G.E.
Mesejo Pablo
Pastur-Romay Lucas Antón
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 05/07/2018
Field of study

Understanding the role of astrocytes in brain computation is a nascent challenge, promising immense rewards, in terms of new neurobiological knowledge that can be translated into artificial intelligence. In our ongoing effort to identify principles endow-ing the astrocyte with unique functions in brain computation, and translate them into neural-astrocytic networks (NANs), we propose a biophysically realistic model of an astrocyte that preserves the experimentally observed spatial allocation of its distinct subcellular compartments. We show how our model may encode, and modu-late, the extent of synchronous neural activity via calcium waves that propagate intracellularly across the astrocytic compartments. This relationship between neural activity and astrocytic calcium waves has long been speculated but it is still lacking a mechanistic explanation. Our model suggests an astrocytic "calcium cascade" mechanism for neuronal synchronization, which may empower NANs by imposing periodic neural modulation known to reduce coding errors. By expanding our notions of information processing in astrocytes, our work aims to solidify a computational role for non-neuronal cells and incorporate them into artificial networks.Comment: International Conference on Neuromorphic Systems (ICONS) 201

arXiv.org e-Print Archive

Crossref

Revisiting loss-specific training of filter-based MRFs for image restoration

Author: A. Barbu
B. Colson
D.C. Liu
G.E. Hinton
H. Zhang
J. Domke
J. Jancsary
K. Dabov
M. Elad
Q. Gao
S. Roth
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

It is now well known that Markov random fields (MRFs) are particularly effective for modeling image priors in low-level vision. Recent years have seen the emergence of two main approaches for learning the parameters in MRFs: (1) probabilistic learning using sampling-based algorithms and (2) loss-specific training based on MAP estimate. After investigating existing training approaches, it turns out that the performance of the loss-specific training has been significantly underestimated in existing work. In this paper, we revisit this approach and use techniques from bi-level optimization to solve it. We show that we can get a substantial gain in the final performance by solving the lower-level problem in the bi-level framework with high accuracy using our newly proposed algorithm. As a result, our trained model is on par with highly specialized image denoising algorithms and clearly outperforms probabilistically trained MRF models. Our findings suggest that for the loss-specific training scheme, solving the lower-level problem with higher accuracy is beneficial. Our trained model comes along with the additional advantage, that inference is extremely efficient. Our GPU-based implementation takes less than 1s to produce state-of-the-art performance.Comment: 10 pages, 2 figures, appear at 35th German Conference, GCPR 2013, Saarbr\"ucken, Germany, September 3-6, 2013. Proceeding

arXiv.org e-Print Archive

CiteSeerX

Crossref

Learning in a Unitary Coherent Hippocampus

Author: G.E. Hinton
J. Lisman
M. Ceaser
M.E. Hasselmo
N. Spruston
O. Jensen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Crossref

White Rose Research Online