121 research outputs found
Parametrizing filters of a CNN with a GAN
It is commonly agreed that the use of relevant invariances as a good
statistical bias is important in machine-learning. However, most approaches
that explicitly incorporate invariances into a model architecture only make use
of very simple transformations, such as translations and rotations. Hence,
there is a need for methods to model and extract richer transformations that
capture much higher-level invariances. To that end, we introduce a tool
allowing to parametrize the set of filters of a trained convolutional neural
network with the latent space of a generative adversarial network. We then show
that the method can capture highly non-linear invariances of the data by
visualizing their effect in the data space
HyperNets and their application to learning spatial transformations
In this paper we propose a conceptual framework for higher-order artificial
neural networks. The idea of higher-order networks arises naturally when a
model is required to learn some group of transformations, every element of
which is well-approximated by a traditional feedforward network. Thus the group
as a whole can be represented as a hyper network. One of typical examples of
such groups is spatial transformations. We show that the proposed framework,
which we call HyperNets, is able to deal with at least two basic spatial
transformations of images: rotation and affine transformation. We show that
HyperNets are able not only to generalize rotation and affine transformation,
but also to compensate the rotation of images bringing them into canonical
forms
Learning Good Representation via Continuous Attention
In this paper we present our scientific discovery that good representation
can be learned via continuous attention during the interaction between
Unsupervised Learning(UL) and Reinforcement Learning(RL) modules driven by
intrinsic motivation. Specifically, we designed intrinsic rewards generated
from UL modules for driving the RL agent to focus on objects for a period of
time and to learn good representations of objects for later object recognition
task. We evaluate our proposed algorithm in both with and without extrinsic
reward settings. Experiments with end-to-end training in simulated environments
with applications to few-shot object recognition demonstrated the effectiveness
of the proposed algorithm
Unsupervised Pretraining Encourages Moderate-Sparseness
It is well known that direct training of deep neural networks will generally
lead to poor results. A major progress in recent years is the invention of
various pretraining methods to initialize network parameters and it was shown
that such methods lead to good prediction performance. However, the reason for
the success of pretraining has not been fully understood, although it was
argued that regularization and better optimization play certain roles. This
paper provides another explanation for the effectiveness of pretraining, where
we show pretraining leads to a sparseness of hidden unit activation in the
resulting neural networks. The main reason is that the pretraining models can
be interpreted as an adaptive sparse coding. Compared to deep neural network
with sigmoid function, our experimental results on MNIST and Birdsong further
support this sparseness observation.Comment: 6 pages, 2 figures, (to appear) ICML-Workshop on Unsupervised
Learning from Bioacoustic Big Data (uLearnBio) 201
Auto-pooling: Learning to Improve Invariance of Image Features from Image Sequences
Learning invariant representations from images is one of the hardest
challenges facing computer vision. Spatial pooling is widely used to create
invariance to spatial shifting, but it is restricted to convolutional models.
In this paper, we propose a novel pooling method that can learn soft clustering
of features from image sequences. It is trained to improve the temporal
coherence of features, while keeping the information loss at minimum. Our
method does not use spatial information, so it can be used with
non-convolutional models too. Experiments on images extracted from natural
videos showed that our method can cluster similar features together. When
trained by convolutional features, auto-pooling outperformed traditional
spatial pooling on an image classification task, even though it does not use
the spatial topology of features.Comment: 9 pages, 10 figures. Submission for ICLR 201
Unsupervised Learning Layers for Video Analysis
This paper presents two unsupervised learning layers (UL layers) for
label-free video analysis: one for fully connected layers, and the other for
convolutional ones. The proposed UL layers can play two roles: they can be the
cost function layer for providing global training signal; meanwhile they can be
added to any regular neural network layers for providing local training signals
and combined with the training signals backpropagated from upper layers for
extracting both slow and fast changing features at layers of different depths.
Therefore, the UL layers can be used in either pure unsupervised or
semi-supervised settings. Both a closed-form solution and an online learning
algorithm for two UL layers are provided. Experiments with unlabeled synthetic
and real-world videos demonstrated that the neural networks equipped with UL
layers and trained with the proposed online learning algorithm can extract
shape and motion information from video sequences of moving objects. The
experiments demonstrated the potential applications of UL layers and online
learning algorithm to head orientation estimation and moving object
localization
Analyzing noise in autoencoders and deep networks
Autoencoders have emerged as a useful framework for unsupervised learning of
internal representations, and a wide variety of apparently conceptually
disparate regularization techniques have been proposed to generate useful
features. Here we extend existing denoising autoencoders to additionally inject
noise before the nonlinearity, and at the hidden unit activations. We show that
a wide variety of previous methods, including denoising, contractive, and
sparse autoencoders, as well as dropout can be interpreted using this
framework. This noise injection framework reaps practical benefits by providing
a unified strategy to develop new internal representations by designing the
nature of the injected noise. We show that noisy autoencoders outperform
denoising autoencoders at the very task of denoising, and are competitive with
other single-layer techniques on MNIST, and CIFAR-10. We also show that types
of noise other than dropout improve performance in a deep network through
sparsifying, decorrelating, and spreading information across representations
Application of Deep Learning on Predicting Prognosis of Acute Myeloid Leukemia with Cytogenetics, Age, and Mutations
We explore how Deep Learning (DL) can be utilized to predict prognosis of
acute myeloid leukemia (AML). Out of TCGA (The Cancer Genome Atlas) database,
94 AML cases are used in this study. Input data include age, 10 common
cytogenetic and 23 most common mutation results; output is the prognosis
(diagnosis to death, DTD). In our DL network, autoencoders are stacked to form
a hierarchical DL model from which raw data are compressed and organized and
high-level features are extracted. The network is written in R language and is
designed to predict prognosis of AML for a given case (DTD of more than or less
than 730 days). The DL network achieves an excellent accuracy of 83% in
predicting prognosis. As a proof-of-concept study, our preliminary results
demonstrate a practical application of DL in future practice of prognostic
prediction using next-gen sequencing (NGS) data.Comment: 11 pages, 1 table, 1 figure. arXiv admin note: substantial text
overlap with arXiv:1801.0101
Theta-RBM: Unfactored Gated Restricted Boltzmann Machine for Rotation-Invariant Representations
Learning invariant representations is a critical task in computer vision. In
this paper, we propose the Theta-Restricted Boltzmann Machine ({\theta}-RBM in
short), which builds upon the original RBM formulation and injects the notion
of rotation-invariance during the learning procedure. In contrast to previous
approaches, we do not transform the training set with all possible rotations.
Instead, we rotate the gradient filters when they are computed during the
Contrastive Divergence algorithm. We formulate our model as an unfactored gated
Boltzmann machine, where another input layer is used to modulate the input
visible layer to drive the optimisation procedure. Among our contributions is a
mathematical proof that demonstrates that {\theta}-RBM is able to learn
rotation-invariant features according to a recently proposed invariance
measure. Our method reaches an invariance score of ~90% on mnist-rot dataset,
which is the highest result compared with the baseline methods and the current
state of the art in transformation-invariant feature learning in RBM. Using an
SVM classifier, we also showed that our network learns discriminative features
as well, obtaining ~10% of testing error.Comment: 9 pages, 2 figures, 3 table
Deep Convolutional Inverse Graphics Network
This paper presents the Deep Convolution Inverse Graphics Network (DC-IGN), a
model that learns an interpretable representation of images. This
representation is disentangled with respect to transformations such as
out-of-plane rotations and lighting variations. The DC-IGN model is composed of
multiple layers of convolution and de-convolution operators and is trained
using the Stochastic Gradient Variational Bayes (SGVB) algorithm. We propose a
training procedure to encourage neurons in the graphics code layer to represent
a specific transformation (e.g. pose or light). Given a single input image, our
model can generate new images of the same object with variations in pose and
lighting. We present qualitative and quantitative results of the model's
efficacy at learning a 3D rendering engine.Comment: First two authors contributed equall
- …