18 research outputs found
Learning Ordered Representations with Nested Dropout
In this paper, we study ordered representations of data in which different
dimensions have different degrees of importance. To learn these representations
we introduce nested dropout, a procedure for stochastically removing coherent
nested sets of hidden units in a neural network. We first present a sequence of
theoretical results in the simple case of a semi-linear autoencoder. We
rigorously show that the application of nested dropout enforces identifiability
of the units, which leads to an exact equivalence with PCA. We then extend the
algorithm to deep models and demonstrate the relevance of ordered
representations to a number of applications. Specifically, we use the ordered
property of the learned codes to construct hash-based data structures that
permit very fast retrieval, achieving retrieval in time logarithmic in the
database size and independent of the dimensionality of the representation. This
allows codes that are hundreds of times longer than currently feasible for
retrieval. We therefore avoid the diminished quality associated with short
codes, while still performing retrieval that is competitive in speed with
existing methods. We also show that ordered representations are a promising way
to learn adaptive compression for efficient online data reconstruction.Comment: 11 pages, 5 figures. Submitted for publicatio
Learning Compact Convolutional Neural Networks with Nested Dropout
Recently, nested dropout was proposed as a method for ordering representation
units in autoencoders by their information content, without diminishing
reconstruction cost. However, it has only been applied to training
fully-connected autoencoders in an unsupervised setting. We explore the impact
of nested dropout on the convolutional layers in a CNN trained by
backpropagation, investigating whether nested dropout can provide a simple and
systematic way to determine the optimal representation size with respect to the
desired accuracy and desired task and data complexity.Comment: 4 pages, 2 figures. Accepted as a workshop contribution at ICLR 201
Spectral Representations for Convolutional Neural Networks
Discrete Fourier transforms provide a significant speedup in the computation
of convolutions in deep learning. In this work, we demonstrate that, beyond its
advantages for efficient computation, the spectral domain also provides a
powerful representation in which to model and train convolutional neural
networks (CNNs).
We employ spectral representations to introduce a number of innovations to
CNN design. First, we propose spectral pooling, which performs dimensionality
reduction by truncating the representation in the frequency domain. This
approach preserves considerably more information per parameter than other
pooling strategies and enables flexibility in the choice of pooling output
dimensionality. This representation also enables a new form of stochastic
regularization by randomized modification of resolution. We show that these
methods achieve competitive results on classification and approximation tasks,
without using any dropout or max-pooling.
Finally, we demonstrate the effectiveness of complex-coefficient spectral
parameterization of convolutional filters. While this leaves the underlying
model unchanged, it results in a representation that greatly facilitates
optimization. We observe on a variety of popular CNN configurations that this
leads to significantly faster convergence during training
Learning with Pseudo-Ensembles
We formalize the notion of a pseudo-ensemble, a (possibly infinite)
collection of child models spawned from a parent model by perturbing it
according to some noise process. E.g., dropout (Hinton et. al, 2012) in a deep
neural network trains a pseudo-ensemble of child subnetworks generated by
randomly masking nodes in the parent network. We present a novel regularizer
based on making the behavior of a pseudo-ensemble robust with respect to the
noise process generating it. In the fully-supervised setting, our regularizer
matches the performance of dropout. But, unlike dropout, our regularizer
naturally extends to the semi-supervised setting, where it produces
state-of-the-art results. We provide a case study in which we transform the
Recursive Neural Tensor Network of (Socher et. al, 2013) into a
pseudo-ensemble, which significantly improves its performance on a real-world
sentiment analysis benchmark.Comment: To appear in Advances in Neural Information Processing Systems 27
(NIPS 2014), Advances in Neural Information Processing Systems 27, Dec. 201
Stick-Breaking Variational Autoencoders
We extend Stochastic Gradient Variational Bayes to perform posterior
inference for the weights of Stick-Breaking processes. This development allows
us to define a Stick-Breaking Variational Autoencoder (SB-VAE), a Bayesian
nonparametric version of the variational autoencoder that has a latent
representation with stochastic dimensionality. We experimentally demonstrate
that the SB-VAE, and a semi-supervised variant, learn highly discriminative
latent representations that often outperform the Gaussian VAE's.Comment: ICLR 2017, Conference Trac
Compressing Neural Networks with the Hashing Trick
As deep nets are increasingly used in applications suited for mobile devices,
a fundamental dilemma becomes apparent: the trend in deep learning is to grow
models to absorb ever-increasing data set sizes; however mobile devices are
designed with very little memory and cannot store such large models. We present
a novel network architecture, HashedNets, that exploits inherent redundancy in
neural networks to achieve drastic reductions in model sizes. HashedNets uses a
low-cost hash function to randomly group connection weights into hash buckets,
and all connections within the same hash bucket share a single parameter value.
These parameters are tuned to adjust to the HashedNets weight sharing
architecture with standard backprop during training. Our hashing procedure
introduces no additional memory overhead, and we demonstrate on several
benchmark data sets that HashedNets shrink the storage requirements of neural
networks substantially while mostly preserving generalization performance
Dropout with Tabu Strategy for Regularizing Deep Neural Networks
Dropout has proven to be an effective technique for regularization and
preventing the co-adaptation of neurons in deep neural networks (DNN). It
randomly drops units with a probability during the training stage of DNN.
Dropout also provides a way of approximately combining exponentially many
different neural network architectures efficiently. In this work, we add a
diversification strategy into dropout, which aims at generating more different
neural network architectures in a proper times of iterations. The dropped units
in last forward propagation will be marked. Then the selected units for
dropping in the current FP will be kept if they have been marked in the last
forward propagation. We only mark the units from the last forward propagation.
We call this new technique Tabu Dropout. Tabu Dropout has no extra parameters
compared with the standard Dropout and also it is computationally cheap. The
experiments conducted on MNIST, Fashion-MNIST datasets show that Tabu Dropout
improves the performance of the standard dropout
Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks
Natural language is hierarchically structured: smaller units (e.g., phrases)
are nested within larger units (e.g., clauses). When a larger constituent ends,
all of the smaller constituents that are nested within it must also be closed.
While the standard LSTM architecture allows different neurons to track
information at different time scales, it does not have an explicit bias towards
modeling a hierarchy of constituents. This paper proposes to add such an
inductive bias by ordering the neurons; a vector of master input and forget
gates ensures that when a given neuron is updated, all the neurons that follow
it in the ordering are also updated. Our novel recurrent architecture, ordered
neurons LSTM (ON-LSTM), achieves good performance on four different tasks:
language modeling, unsupervised parsing, targeted syntactic evaluation, and
logical inference.Comment: Published as a conference paper at ICLR 201
An Infinite Restricted Boltzmann Machine
We present a mathematical construction for the restricted Boltzmann machine
(RBM) that doesn't require specifying the number of hidden units. In fact, the
hidden layer size is adaptive and can grow during training. This is obtained by
first extending the RBM to be sensitive to the ordering of its hidden units.
Then, thanks to a carefully chosen definition of the energy function, we show
that the limit of infinitely many hidden units is well defined. As with RBM,
approximate maximum likelihood training can be performed, resulting in an
algorithm that naturally and adaptively adds trained hidden units during
learning. We empirically study the behaviour of this infinite RBM, showing that
its performance is competitive to that of the RBM, while not requiring the
tuning of a hidden layer size.Comment: 25 pages, 8 figure
Anime Style Space Exploration Using Metric Learning and Generative Adversarial Networks
Deep learning-based style transfer between images has recently become a
popular area of research. A common way of encoding "style" is through a feature
representation based on the Gram matrix of features extracted by some
pre-trained neural network or some other form of feature statistics. Such a
definition is based on an arbitrary human decision and may not best capture
what a style really is. In trying to gain a better understanding of "style", we
propose a metric learning-based method to explicitly encode the style of an
artwork. In particular, our definition of style captures the differences
between artists, as shown by classification performances, and such that the
style representation can be interpreted, manipulated and visualized through
style-conditioned image generation through a Generative Adversarial Network. We
employ this method to explore the style space of anime portrait illustrations