8,944 research outputs found
Interpretable Transformations with Encoder-Decoder Networks
Deep feature spaces have the capacity to encode complex transformations of
their input data. However, understanding the relative feature-space
relationship between two transformed encoded images is difficult. For instance,
what is the relative feature space relationship between two rotated images?
What is decoded when we interpolate in feature space? Ideally, we want to
disentangle confounding factors, such as pose, appearance, and illumination,
from object identity. Disentangling these is difficult because they interact in
very nonlinear ways. We propose a simple method to construct a deep feature
space, with explicitly disentangled representations of several known
transformations. A person or algorithm can then manipulate the disentangled
representation, for example, to re-render an image with explicit control over
parameterized degrees of freedom. The feature space is constructed using a
transforming encoder-decoder network with a custom feature transform layer,
acting on the hidden representations. We demonstrate the advantages of explicit
disentangling on a variety of datasets and transformations, and as an aid for
traditional tasks, such as classification.Comment: Accepted at ICCV 201
How Does Our Visual System Achieve Shift and Size Invariance?
The question of shift and size invariance in the primate
visual system is discussed. After a short review of the relevant neurobiology and psychophysics, a more detailed analysis of computational models is given. The two main types of networks considered are the dynamic routing circuit model and invariant feature networks, such as the neocognitron. Some specific open questions in context of these models are raised and possible solutions discussed
Dynamic Steerable Blocks in Deep Residual Networks
Filters in convolutional networks are typically parameterized in a pixel
basis, that does not take prior knowledge about the visual world into account.
We investigate the generalized notion of frames designed with image properties
in mind, as alternatives to this parametrization. We show that frame-based
ResNets and Densenets can improve performance on Cifar-10+ consistently, while
having additional pleasant properties like steerability. By exploiting these
transformation properties explicitly, we arrive at dynamic steerable blocks.
They are an extension of residual blocks, that are able to seamlessly transform
filters under pre-defined transformations, conditioned on the input at training
and inference time. Dynamic steerable blocks learn the degree of invariance
from data and locally adapt filters, allowing them to apply a different
geometrical variant of the same filter to each location of the feature map.
When evaluated on the Berkeley Segmentation contour detection dataset, our
approach outperforms all competing approaches that do not utilize pre-training.
Our results highlight the benefits of image-based regularization to deep
networks
Learning to Convolve: A Generalized Weight-Tying Approach
Recent work (Cohen & Welling, 2016) has shown that generalizations of
convolutions, based on group theory, provide powerful inductive biases for
learning. In these generalizations, filters are not only translated but can
also be rotated, flipped, etc. However, coming up with exact models of how to
rotate a 3 x 3 filter on a square pixel-grid is difficult. In this paper, we
learn how to transform filters for use in the group convolution, focussing on
roto-translation. For this, we learn a filter basis and all rotated versions of
that filter basis. Filters are then encoded by a set of rotation invariant
coefficients. To rotate a filter, we switch the basis. We demonstrate we can
produce feature maps with low sensitivity to input rotations, while achieving
high performance on MNIST and CIFAR-10.Comment: Accepted to ICML 201
- …