1,120 research outputs found
Rotation-Invariant Restricted Boltzmann Machine Using Shared Gradient Filters
Finding suitable features has been an essential problem in computer vision.
We focus on Restricted Boltzmann Machines (RBMs), which, despite their
versatility, cannot accommodate transformations that may occur in the scene. As
a result, several approaches have been proposed that consider a set of
transformations, which are used to either augment the training set or transform
the actual learned filters. In this paper, we propose the Explicit
Rotation-Invariant Restricted Boltzmann Machine, which exploits prior
information coming from the dominant orientation of images. Our model extends
the standard RBM, by adding a suitable number of weight matrices, associated
with each dominant gradient. We show that our approach is able to learn
rotation-invariant features, comparing it with the classic formulation of RBM
on the MNIST benchmark dataset. Overall, requiring less hidden units, our
method learns compact features, which are robust to rotations.Comment: 8 pages, 3 figures, 1 tabl
Discriminative Recurrent Sparse Auto-Encoders
We present the discriminative recurrent sparse auto-encoder model, comprising
a recurrent encoder of rectified linear units, unrolled for a fixed number of
iterations, and connected to two linear decoders that reconstruct the input and
predict its supervised classification. Training via
backpropagation-through-time initially minimizes an unsupervised sparse
reconstruction error; the loss function is then augmented with a discriminative
term on the supervised classification. The depth implicit in the
temporally-unrolled form allows the system to exhibit all the power of deep
networks, while substantially reducing the number of trainable parameters.
From an initially unstructured network the hidden units differentiate into
categorical-units, each of which represents an input prototype with a
well-defined class; and part-units representing deformations of these
prototypes. The learned organization of the recurrent encoder is hierarchical:
part-units are driven directly by the input, whereas the activity of
categorical-units builds up over time through interactions with the part-units.
Even using a small number of hidden units per layer, discriminative recurrent
sparse auto-encoders achieve excellent performance on MNIST.Comment: Added clarifications suggested by reviewers. 15 pages, 10 figure
Improving Deep Representation Learning with Complex and Multimodal Data.
Representation learning has emerged as a way to learn meaningful representation from data and made a breakthrough in many applications including visual object recognition, speech recognition, and text understanding. However, learning representation from complex high-dimensional sensory data is challenging since there exist many irrelevant factors of variation (e.g., data transformation, random noise). On the other hand, to build an end-to-end prediction system for structured output variables, one needs to incorporate probabilistic inference to properly model a mapping from single input to possible configurations of output variables. This thesis addresses limitations of current representation learning in two parts.
The first part discusses efficient learning algorithms of invariant representation based on restricted Boltzmann machines (RBMs). Pointing out the difficulty of learning, we develop an efficient initialization method for sparse and convolutional RBMs. On top of that, we develop variants of RBM that learn representations invariant to data transformations such as translation, rotation, or scale variation by pooling the filter responses of input data after a transformation, or to irrelevant patterns such as random or structured noise, by jointly performing feature selection and feature learning. We demonstrate improved performance on visual object recognition and weakly supervised foreground object segmentation.
The second part discusses conditional graphical models and learning frameworks for structured output variables using deep generative models as prior. For example, we combine the best properties of the CRF and the RBM to enforce both local and global (e.g., object shape) consistencies for visual object segmentation. Furthermore, we develop a deep conditional generative model of structured output variables, which is an end-to-end system trainable by backpropagation. We demonstrate the importance of global prior and probabilistic inference for visual object segmentation. Second, we develop a novel multimodal learning framework by casting the problem into structured output representation learning problems, where the output is one data modality to be predicted from the other modalities, and vice versa. We explain as to how our method could be more effective than maximum likelihood learning and demonstrate the state-of-the-art performance on visual-text and visual-only recognition tasks.PhDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113549/1/kihyuks_1.pd
Interpretable Transformations with Encoder-Decoder Networks
Deep feature spaces have the capacity to encode complex transformations of
their input data. However, understanding the relative feature-space
relationship between two transformed encoded images is difficult. For instance,
what is the relative feature space relationship between two rotated images?
What is decoded when we interpolate in feature space? Ideally, we want to
disentangle confounding factors, such as pose, appearance, and illumination,
from object identity. Disentangling these is difficult because they interact in
very nonlinear ways. We propose a simple method to construct a deep feature
space, with explicitly disentangled representations of several known
transformations. A person or algorithm can then manipulate the disentangled
representation, for example, to re-render an image with explicit control over
parameterized degrees of freedom. The feature space is constructed using a
transforming encoder-decoder network with a custom feature transform layer,
acting on the hidden representations. We demonstrate the advantages of explicit
disentangling on a variety of datasets and transformations, and as an aid for
traditional tasks, such as classification.Comment: Accepted at ICCV 201
- …