27 research outputs found
Representing an Object by Interchanging What with Where
Exploring representations is a fundamental step towards understanding vision. The visual system carries two types of information along separate pathways: One is about what it is and the other is about where it is. Initially, the what is represented by a pattern of activity that is distributed across millions of photoreceptors, whereas the where is 'implicitly' given as their retinotopic positions. Many computational theories of object recognition rely on such pixel-based representations, but they are insufficient to learn spatial information such as position and size due to the implicit encoding of the where information. 
Here we try transforming a retinal image of an object into its internal image via interchanging the what with the where, which means that patterns of intensity in internal image describe the spatial information rather than the object information. To be concrete, the retinal image of an object is deformed and turned over into a negative image, in which light areas appear dark and vice versa, and the object's spatial information is quantified with levels of intensity on borders of that image. 
Interestingly, the inner part excluding the borders of the internal image shows the position and scale invariance. In order to further understand how the internal image associates the what and where, we examined the internal image of a face which moves or is scaled on the retina. As a result, we found that the internal images form a linear vector space under the object translation and scaling. 
In conclusion, these results show that the what-where interchangeability might play an important role for organizing those two into internal representation of brain
Learning the Irreducible Representations of Commutative Lie Groups
We present a new probabilistic model of compact commutative Lie groups that
produces invariant-equivariant and disentangled representations of data. To
define the notion of disentangling, we borrow a fundamental principle from
physics that is used to derive the elementary particles of a system from its
symmetries. Our model employs a newfound Bayesian conjugacy relation that
enables fully tractable probabilistic inference over compact commutative Lie
groups -- a class that includes the groups that describe the rotation and
cyclic translation of images. We train the model on pairs of transformed image
patches, and show that the learned invariant representation is highly effective
for classification
Learning Unitary Operators with Help From u(n)
A major challenge in the training of recurrent neural networks is the
so-called vanishing or exploding gradient problem. The use of a norm-preserving
transition operator can address this issue, but parametrization is challenging.
In this work we focus on unitary operators and describe a parametrization using
the Lie algebra associated with the Lie group of unitary matrices. The exponential map provides a correspondence
between these spaces, and allows us to define a unitary matrix using real
coefficients relative to a basis of the Lie algebra. The parametrization is
closed under additive updates of these coefficients, and thus provides a simple
space in which to do gradient descent. We demonstrate the effectiveness of this
parametrization on the problem of learning arbitrary unitary operators,
comparing to several baselines and outperforming a recently-proposed
lower-dimensional parametrization. We additionally use our parametrization to
generalize a recently-proposed unitary recurrent neural network to arbitrary
unitary matrices, using it to solve standard long-memory tasks.Comment: 9 pages, 3 figures, 5 figures inc. subfigures, to appear at AAAI-1
Transformation Properties of Learned Visual Representations
When a three-dimensional object moves relative to an observer, a change
occurs on the observer's image plane and in the visual representation computed
by a learned model. Starting with the idea that a good visual representation is
one that transforms linearly under scene motions, we show, using the theory of
group representations, that any such representation is equivalent to a
combination of the elementary irreducible representations. We derive a striking
relationship between irreducibility and the statistical dependency structure of
the representation, by showing that under restricted conditions, irreducible
representations are decorrelated. Under partial observability, as induced by
the perspective projection of a scene onto the image plane, the motion group
does not have a linear action on the space of images, so that it becomes
necessary to perform inference over a latent representation that does transform
linearly. This idea is demonstrated in a model of rotating NORB objects that
employs a latent representation of the non-commutative 3D rotation group SO(3).Comment: T.S. Cohen & M. Welling, Transformation Properties of Learned Visual
Representations. In International Conference on Learning Representations
(ICLR), 201