27 research outputs found

    Representing an Object by Interchanging What with Where

    Get PDF
    Exploring representations is a fundamental step towards understanding vision. The visual system carries two types of information along separate pathways: One is about what it is and the other is about where it is. Initially, the what is represented by a pattern of activity that is distributed across millions of photoreceptors, whereas the where is 'implicitly' given as their retinotopic positions. Many computational theories of object recognition rely on such pixel-based representations, but they are insufficient to learn spatial information such as position and size due to the implicit encoding of the where information. 
Here we try transforming a retinal image of an object into its internal image via interchanging the what with the where, which means that patterns of intensity in internal image describe the spatial information rather than the object information. To be concrete, the retinal image of an object is deformed and turned over into a negative image, in which light areas appear dark and vice versa, and the object's spatial information is quantified with levels of intensity on borders of that image. 
Interestingly, the inner part excluding the borders of the internal image shows the position and scale invariance. In order to further understand how the internal image associates the what and where, we examined the internal image of a face which moves or is scaled on the retina. As a result, we found that the internal images form a linear vector space under the object translation and scaling. 
In conclusion, these results show that the what-where interchangeability might play an important role for organizing those two into internal representation of brain

    Learning the Irreducible Representations of Commutative Lie Groups

    Get PDF
    We present a new probabilistic model of compact commutative Lie groups that produces invariant-equivariant and disentangled representations of data. To define the notion of disentangling, we borrow a fundamental principle from physics that is used to derive the elementary particles of a system from its symmetries. Our model employs a newfound Bayesian conjugacy relation that enables fully tractable probabilistic inference over compact commutative Lie groups -- a class that includes the groups that describe the rotation and cyclic translation of images. We train the model on pairs of transformed image patches, and show that the learned invariant representation is highly effective for classification

    Learning Unitary Operators with Help From u(n)

    Full text link
    A major challenge in the training of recurrent neural networks is the so-called vanishing or exploding gradient problem. The use of a norm-preserving transition operator can address this issue, but parametrization is challenging. In this work we focus on unitary operators and describe a parametrization using the Lie algebra u(n)\mathfrak{u}(n) associated with the Lie group U(n)U(n) of n×nn \times n unitary matrices. The exponential map provides a correspondence between these spaces, and allows us to define a unitary matrix using n2n^2 real coefficients relative to a basis of the Lie algebra. The parametrization is closed under additive updates of these coefficients, and thus provides a simple space in which to do gradient descent. We demonstrate the effectiveness of this parametrization on the problem of learning arbitrary unitary operators, comparing to several baselines and outperforming a recently-proposed lower-dimensional parametrization. We additionally use our parametrization to generalize a recently-proposed unitary recurrent neural network to arbitrary unitary matrices, using it to solve standard long-memory tasks.Comment: 9 pages, 3 figures, 5 figures inc. subfigures, to appear at AAAI-1

    Transformation Properties of Learned Visual Representations

    Get PDF
    When a three-dimensional object moves relative to an observer, a change occurs on the observer's image plane and in the visual representation computed by a learned model. Starting with the idea that a good visual representation is one that transforms linearly under scene motions, we show, using the theory of group representations, that any such representation is equivalent to a combination of the elementary irreducible representations. We derive a striking relationship between irreducibility and the statistical dependency structure of the representation, by showing that under restricted conditions, irreducible representations are decorrelated. Under partial observability, as induced by the perspective projection of a scene onto the image plane, the motion group does not have a linear action on the space of images, so that it becomes necessary to perform inference over a latent representation that does transform linearly. This idea is demonstrated in a model of rotating NORB objects that employs a latent representation of the non-commutative 3D rotation group SO(3).Comment: T.S. Cohen & M. Welling, Transformation Properties of Learned Visual Representations. In International Conference on Learning Representations (ICLR), 201
    corecore