3,260 research outputs found
Joint Training of Deep Boltzmann Machines
We introduce a new method for training deep Boltzmann machines jointly. Prior
methods require an initial learning pass that trains the deep Boltzmann machine
greedily, one layer at a time, or do not perform well on classifi- cation
tasks.Comment: 4 page
Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation
Stochastic neurons and hard non-linearities can be useful for a number of
reasons in deep learning models, but in many cases they pose a challenging
problem: how to estimate the gradient of a loss function with respect to the
input of such stochastic or non-smooth neurons? I.e., can we "back-propagate"
through these stochastic neurons? We examine this question, existing
approaches, and compare four families of solutions, applicable in different
settings. One of them is the minimum variance unbiased gradient estimator for
stochatic binary neurons (a special case of the REINFORCE algorithm). A second
approach, introduced here, decomposes the operation of a binary stochastic
neuron into a stochastic binary part and a smooth differentiable part, which
approximates the expected effect of the pure stochatic binary neuron to first
order. A third approach involves the injection of additive or multiplicative
noise in a computational graph that is otherwise differentiable. A fourth
approach heuristically copies the gradient with respect to the stochastic
output directly as an estimator of the gradient with respect to the sigmoid
argument (we call this the straight-through estimator). To explore a context
where these estimators are useful, we consider a small-scale version of {\em
conditional computation}, where sparse stochastic units form a distributed
representation of gaters that can turn off in combinatorially many ways large
chunks of the computation performed in the rest of the neural network. In this
case, it is important that the gating units produce an actual 0 most of the
time. The resulting sparsity can be potentially be exploited to greatly reduce
the computational cost of large deep networks for which conditional computation
would be useful.Comment: arXiv admin note: substantial text overlap with arXiv:1305.298
Describing Multimedia Content using Attention-based Encoder--Decoder Networks
Whereas deep neural networks were first mostly used for classification tasks,
they are rapidly expanding in the realm of structured output problems, where
the observed target is composed of multiple random variables that have a rich
joint distribution, given the input. We focus in this paper on the case where
the input also has a rich structure and the input and output structures are
somehow related. We describe systems that learn to attend to different places
in the input, for each element of the output, for a variety of tasks: machine
translation, image caption generation, video clip description and speech
recognition. All these systems are based on a shared set of building blocks:
gated recurrent neural networks and convolutional neural networks, along with
trained attention mechanisms. We report on experimental results with these
systems, showing impressively good performance and the advantage of the
attention mechanism.Comment: Submitted to IEEE Transactions on Multimedia Special Issue on Deep
Learning for Multimedia Computin
Large-Scale Feature Learning With Spike-and-Slab Sparse Coding
We consider the problem of object recognition with a large number of classes.
In order to overcome the low amount of labeled examples available in this
setting, we introduce a new feature learning and extraction procedure based on
a factor model we call spike-and-slab sparse coding (S3C). Prior work on S3C
has not prioritized the ability to exploit parallel architectures and scale S3C
to the enormous problem sizes needed for object recognition. We present a novel
inference procedure for appropriate for use with GPUs which allows us to
dramatically increase both the training set size and the amount of latent
factors that S3C may be trained with. We demonstrate that this approach
improves upon the supervised learning capabilities of both sparse coding and
the spike-and-slab Restricted Boltzmann Machine (ssRBM) on the CIFAR-10
dataset. We use the CIFAR-100 dataset to demonstrate that our method scales to
large numbers of classes better than previous methods. Finally, we use our
method to win the NIPS 2011 Workshop on Challenges In Learning Hierarchical
Models? Transfer Learning Challenge.Comment: Appears in Proceedings of the 29th International Conference on
Machine Learning (ICML 2012). arXiv admin note: substantial text overlap with
arXiv:1201.338
Efficient EM Training of Gaussian Mixtures with Missing Data
In data-mining applications, we are frequently faced with a large fraction of
missing entries in the data matrix, which is problematic for most discriminant
machine learning algorithms. A solution that we explore in this paper is the
use of a generative model (a mixture of Gaussians) to compute the conditional
expectation of the missing variables given the observed variables. Since
training a Gaussian mixture with many different patterns of missing values can
be computationally very expensive, we introduce a spanning-tree based algorithm
that significantly speeds up training in these conditions. We also observe that
good results can be obtained by using the generative model to fill-in the
missing values for a separate discriminant learning algorithm
Disentangling Factors of Variation via Generative Entangling
Here we propose a novel model family with the objective of learning to
disentangle the factors of variation in data. Our approach is based on the
spike-and-slab restricted Boltzmann machine which we generalize to include
higher-order interactions among multiple latent variables. Seen from a
generative perspective, the multiplicative interactions emulates the entangling
of factors of variation. Inference in the model can be seen as disentangling
these generative factors. Unlike previous attempts at disentangling latent
factors, the proposed model is trained using no supervised information
regarding the latent factors. We apply our model to the task of facial
expression classification
Discriminative Regularization for Generative Models
We explore the question of whether the representations learned by classifiers
can be used to enhance the quality of generative models. Our conjecture is that
labels correspond to characteristics of natural data which are most salient to
humans: identity in faces, objects in images, and utterances in speech. We
propose to take advantage of this by using the representations from
discriminative classifiers to augment the objective function corresponding to a
generative model. In particular we enhance the objective function of the
variational autoencoder, a popular generative model, with a discriminative
regularization term. We show that enhancing the objective function in this way
leads to samples that are clearer and have higher visual quality than the
samples from the standard variational autoencoders
On Training Deep Boltzmann Machines
The deep Boltzmann machine (DBM) has been an important development in the
quest for powerful "deep" probabilistic models. To date, simultaneous or joint
training of all layers of the DBM has been largely unsuccessful with existing
training methods. We introduce a simple regularization scheme that encourages
the weight vectors associated with each hidden unit to have similar norms. We
demonstrate that this regularization can be easily combined with standard
stochastic maximum likelihood to yield an effective training strategy for the
simultaneous training of all layers of the deep Boltzmann machine
A Controller-Recognizer Framework: How necessary is recognition for control?
Recently there has been growing interest in building active visual object
recognizers, as opposed to the usual passive recognizers which classifies a
given static image into a predefined set of object categories. In this paper we
propose to generalize these recently proposed end-to-end active visual
recognizers into a controller-recognizer framework. A model in the
controller-recognizer framework consists of a controller, which interfaces with
an external manipulator, and a recognizer which classifies the visual input
adjusted by the manipulator. We describe two most recently proposed
controller-recognizer models: recurrent attention model and spatial transformer
network as representative examples of controller-recognizer models. Based on
this description we observe that most existing end-to-end
controller-recognizers tightly, or completely, couple a controller and
recognizer. We ask a question whether this tight coupling is necessary, and try
to answer this empirically by building a controller-recognizer model with a
decoupled controller and recognizer. Our experiments revealed that it is not
always necessary to tightly couple them and that by decoupling a controller and
recognizer, there is a possibility of building a generic controller that is
pretrained and works together with any subsequent recognizer
Harmonic Recomposition using Conditional Autoregressive Modeling
We demonstrate a conditional autoregressive pipeline for efficient music
recomposition, based on methods presented in van den Oord et al.(2017).
Recomposition (Casal & Casey, 2010) focuses on reworking existing musical
pieces, adhering to structure at a high level while also re-imagining other
aspects of the work. This can involve reuse of pre-existing themes or parts of
the original piece, while also requiring the flexibility to generate new
content at different levels of granularity. Applying the aforementioned
modeling pipeline to recomposition, we show diverse and structured generation
conditioned on chord sequence annotations.Comment: 3 pages, 2 figures. In Proceedings of The Joint Workshop on Machine
Learning for Music, ICML 201
- …
