282 research outputs found
On the Equivalence Between Deep NADE and Generative Stochastic Networks
Neural Autoregressive Distribution Estimators (NADEs) have recently been
shown as successful alternatives for modeling high dimensional multimodal
distributions. One issue associated with NADEs is that they rely on a
particular order of factorization for . This issue has been
recently addressed by a variant of NADE called Orderless NADEs and its deeper
version, Deep Orderless NADE. Orderless NADEs are trained based on a criterion
that stochastically maximizes with all possible orders of
factorizations. Unfortunately, ancestral sampling from deep NADE is very
expensive, corresponding to running through a neural net separately predicting
each of the visible variables given some others. This work makes a connection
between this criterion and the training criterion for Generative Stochastic
Networks (GSNs). It shows that training NADEs in this way also trains a GSN,
which defines a Markov chain associated with the NADE model. Based on this
connection, we show an alternative way to sample from a trained Orderless NADE
that allows to trade-off computing time and quality of the samples: a 3 to
10-fold speedup (taking into account the waste due to correlations between
consecutive samples of the chain) can be obtained without noticeably reducing
the quality of the samples. This is achieved using a novel sampling procedure
for GSNs called annealed GSN sampling, similar to tempering methods that
combines fast mixing (obtained thanks to steps at high noise levels) with
accurate samples (obtained thanks to steps at low noise levels).Comment: ECML/PKDD 201
Contractive De-noising Auto-encoder
Auto-encoder is a special kind of neural network based on reconstruction.
De-noising auto-encoder (DAE) is an improved auto-encoder which is robust to
the input by corrupting the original data first and then reconstructing the
original input by minimizing the reconstruction error function. And contractive
auto-encoder (CAE) is another kind of improved auto-encoder to learn robust
feature by introducing the Frobenius norm of the Jacobean matrix of the learned
feature with respect to the original input. In this paper, we combine
de-noising auto-encoder and contractive auto- encoder, and propose another
improved auto-encoder, contractive de-noising auto- encoder (CDAE), which is
robust to both the original input and the learned feature. We stack CDAE to
extract more abstract features and apply SVM for classification. The experiment
result on benchmark dataset MNIST shows that our proposed CDAE performed better
than both DAE and CAE, proving the effective of our method.Comment: Figures edite
Recommended from our members
Low-cost representation for restricted Boltzmann machines
This paper presents a method for extracting a low-cost representation from restricted Boltzmann machines. The new representation can be considered as a compression of the network, requiring much less storage capacity while reasonably preserving the network's performance at feature learning. We show that the compression can be done by converting the weight matrix of real numbers into a matrix of three values {-1, 0, 1} associated with a score vector of real numbers. This set of values is similar enough to Boolean values which help us further translate the representation into logical rules. In the experiments reported in this paper, we evaluate the performance of our compression method on image datasets, obtaining promising results. Experiments on the MNIST handwritten digit classification dataset, for example, have shown that a 95% saving in memory can be achieved with no significant drop in accuracy
Energy-based temporal neural networks for imputing missing values
Imputing missing values in high dimensional time series is a difficult problem. There have been some approaches to the problem [11,8] where neural architectures were trained as probabilistic models of the data. However, we argue that this approach is not optimal. We propose to view temporal neural networks with latent variables as energy-based models and train them for missing value recovery directly. In this paper we introduce two energy-based models. The first model is based on a one dimensional convolution and the second model utilizes a recurrent neural network. We demonstrate how ideas from the energy-based learning framework can be used to train these models to recover missing values. The models are evaluated on a motion capture dataset
Pushing Stochastic Gradient towards Second-Order Methods -- Backpropagation Learning with Transformations in Nonlinearities
Recently, we proposed to transform the outputs of each hidden neuron in a
multi-layer perceptron network to have zero output and zero slope on average,
and use separate shortcut connections to model the linear dependencies instead.
We continue the work by firstly introducing a third transformation to normalize
the scale of the outputs of each hidden neuron, and secondly by analyzing the
connections to second order optimization methods. We show that the
transformations make a simple stochastic gradient behave closer to second-order
optimization methods and thus speed up learning. This is shown both in theory
and with experiments. The experiments on the third transformation show that
while it further increases the speed of learning, it can also hurt performance
by converging to a worse local optimum, where both the inputs and outputs of
many hidden neurons are close to zero.Comment: 10 pages, 5 figures, ICLR201
Inducing Language Networks from Continuous Space Word Representations
Recent advancements in unsupervised feature learning have developed powerful
latent representations of words. However, it is still not clear what makes one
representation better than another and how we can learn the ideal
representation. Understanding the structure of latent spaces attained is key to
any future advancement in unsupervised learning. In this work, we introduce a
new view of continuous space word representations as language networks. We
explore two techniques to create language networks from learned features by
inducing them for two popular word representation methods and examining the
properties of their resulting networks. We find that the induced networks
differ from other methods of creating language networks, and that they contain
meaningful community structure.Comment: 14 page
Comparing Probabilistic Models for Melodic Sequences
Modelling the real world complexity of music is a challenge for machine
learning. We address the task of modeling melodic sequences from the same music
genre. We perform a comparative analysis of two probabilistic models; a
Dirichlet Variable Length Markov Model (Dirichlet-VMM) and a Time Convolutional
Restricted Boltzmann Machine (TC-RBM). We show that the TC-RBM learns
descriptive music features, such as underlying chords and typical melody
transitions and dynamics. We assess the models for future prediction and
compare their performance to a VMM, which is the current state of the art in
melody generation. We show that both models perform significantly better than
the VMM, with the Dirichlet-VMM marginally outperforming the TC-RBM. Finally,
we evaluate the short order statistics of the models, using the
Kullback-Leibler divergence between test sequences and model samples, and show
that our proposed methods match the statistics of the music genre significantly
better than the VMM.Comment: in Proceedings of the ECML-PKDD 2011. Lecture Notes in Computer
Science, vol. 6913, pp. 289-304. Springer (2011
A Neural-Astrocytic Network Architecture: Astrocytic calcium waves modulate synchronous neuronal activity
Understanding the role of astrocytes in brain computation is a nascent
challenge, promising immense rewards, in terms of new neurobiological knowledge
that can be translated into artificial intelligence. In our ongoing effort to
identify principles endow-ing the astrocyte with unique functions in brain
computation, and translate them into neural-astrocytic networks (NANs), we
propose a biophysically realistic model of an astrocyte that preserves the
experimentally observed spatial allocation of its distinct subcellular
compartments. We show how our model may encode, and modu-late, the extent of
synchronous neural activity via calcium waves that propagate intracellularly
across the astrocytic compartments. This relationship between neural activity
and astrocytic calcium waves has long been speculated but it is still lacking a
mechanistic explanation. Our model suggests an astrocytic "calcium cascade"
mechanism for neuronal synchronization, which may empower NANs by imposing
periodic neural modulation known to reduce coding errors. By expanding our
notions of information processing in astrocytes, our work aims to solidify a
computational role for non-neuronal cells and incorporate them into artificial
networks.Comment: International Conference on Neuromorphic Systems (ICONS) 201
Revisiting loss-specific training of filter-based MRFs for image restoration
It is now well known that Markov random fields (MRFs) are particularly
effective for modeling image priors in low-level vision. Recent years have seen
the emergence of two main approaches for learning the parameters in MRFs: (1)
probabilistic learning using sampling-based algorithms and (2) loss-specific
training based on MAP estimate. After investigating existing training
approaches, it turns out that the performance of the loss-specific training has
been significantly underestimated in existing work. In this paper, we revisit
this approach and use techniques from bi-level optimization to solve it. We
show that we can get a substantial gain in the final performance by solving the
lower-level problem in the bi-level framework with high accuracy using our
newly proposed algorithm. As a result, our trained model is on par with highly
specialized image denoising algorithms and clearly outperforms
probabilistically trained MRF models. Our findings suggest that for the
loss-specific training scheme, solving the lower-level problem with higher
accuracy is beneficial. Our trained model comes along with the additional
advantage, that inference is extremely efficient. Our GPU-based implementation
takes less than 1s to produce state-of-the-art performance.Comment: 10 pages, 2 figures, appear at 35th German Conference, GCPR 2013,
Saarbr\"ucken, Germany, September 3-6, 2013. Proceeding
- …