3,556 research outputs found
Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition
The primate visual system achieves remarkable visual object recognition
performance even in brief presentations and under changes to object exemplar,
geometric transformations, and background variation (a.k.a. core visual object
recognition). This remarkable performance is mediated by the representation
formed in inferior temporal (IT) cortex. In parallel, recent advances in
machine learning have led to ever higher performing models of object
recognition using artificial deep neural networks (DNNs). It remains unclear,
however, whether the representational performance of DNNs rivals that of the
brain. To accurately produce such a comparison, a major difficulty has been a
unifying metric that accounts for experimental limitations such as the amount
of noise, the number of neural recording sites, and the number trials, and
computational limitations such as the complexity of the decoding classifier and
the number of classifier training examples. In this work we perform a direct
comparison that corrects for these experimental limitations and computational
considerations. As part of our methodology, we propose an extension of "kernel
analysis" that measures the generalization accuracy as a function of
representational complexity. Our evaluations show that, unlike previous
bio-inspired models, the latest DNNs rival the representational performance of
IT cortex on this visual object recognition task. Furthermore, we show that
models that perform well on measures of representational performance also
perform well on measures of representational similarity to IT and on measures
of predicting individual IT multi-unit responses. Whether these DNNs rely on
computational mechanisms similar to the primate visual system is yet to be
determined, but, unlike all previous bio-inspired models, that possibility
cannot be ruled out merely on representational performance grounds.Comment: 35 pages, 12 figures, extends and expands upon arXiv:1301.353
Representation Learning in Sensory Cortex: a theory
We review and apply a computational theory of the feedforward path of the ventral stream in visual cortex based on the hypothesis that its main function is the encoding of invariant representations of images. A key justification of the theory is provided by a theorem linking invariant representations to small sample complexity for recognition ā that is, invariant representations allows learning from very few labeled examples. The theory characterizes how an algorithm that can be implemented by a set of āsimpleā and ācomplexā cells ā a āHW moduleā ā provides invariant and selective representations. The invariance can be learned in an unsupervised way from observed transformations. Theorems show that invariance implies several properties of the ventral stream organization, including the eccentricity dependent lattice of units in the retina and in V1, and the tuning of its neurons. The theory requires two stages of processing: the first, consisting of retinotopic visual areas such as V1, V2 and V4 with generic neuronal tuning, leads to representations that are invariant to translation and scaling; the second, consisting of modules in IT, with class- and object-specific tuning, provides a representation for recognition with approximate invariance to class specific transformations, such as pose (of a body, of a face) and expression. In the theory the ventral stream main function is the unsupervised learning of āgoodā representations that reduce the sample complexity of the final supervised learning stage.This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF - 1231216
The computational magic of the ventral stream
I argue that the sample complexity of (biological, feedforward) object recognition is mostly due to geometric image transformations and conjecture that a main goal of the ventral stream – V1, V2, V4 and IT – is to learn-and-discount image transformations.

In the first part of the paper I describe a class of simple and biologically plausible memory-based modules that learn transformations from unsupervised visual experience. The main theorems show that these modules provide (for every object) a signature which is invariant to local affine transformations and approximately invariant for other transformations. I also prove that,
in a broad class of hierarchical architectures, signatures remain invariant from layer to layer. The identification of these memory-based modules with complex (and simple) cells in visual areas leads to a theory of invariant recognition for the ventral stream.

In the second part, I outline a theory about hierarchical architectures that can learn invariance to transformations. I show that the memory complexity of learning affine transformations is drastically reduced in a hierarchical architecture that factorizes transformations in terms of the subgroup of translations and the subgroups of rotations and scalings. I then show how translations are automatically selected as the only learnable transformations during development by enforcing small apertures – eg small receptive fields – in the first layer.

In a third part I show that the transformations represented in each area can be optimized in terms of storage and robustness, as a consequence determining the tuning of the neurons in the area, rather independently (under normal conditions) of the statistics of natural images. I describe a model of learning that can be proved to have this property, linking in an elegant way the spectral properties of the signatures with the tuning of receptive fields in different areas. A surprising implication of these theoretical results is that the computational goals and some of the tuning properties of cells in the ventral stream may follow from symmetry properties (in the sense of physics) of the visual world through a process of unsupervised correlational learning, based on Hebbian synapses. In particular, simple and complex cells do not directly care about oriented bars: their tuning is a side effect of their role in translation invariance. Across the whole ventral stream the preferred features reported for neurons in different areas are only a symptom of the invariances computed and represented.

The results of each of the three parts stand on their own independently of each other. Together this theory-in-fieri makes several broad predictions, some of which are:

-invariance to small transformations in early areas (eg translations in V1) may underly stability of visual perception (suggested by Stu Geman);

-each cell’s tuning properties are shaped by visual experience of image transformations during developmental and adult plasticity;

-simple cells are likely to be the same population as complex cells, arising from different convergence of the Hebbian learning rule. The input to complex “complex” cells are dendritic branches with simple cell properties;

-class-specific transformations are learned and represented at the top of the ventral stream hierarchy; thus class-specific modules such as faces, places and possibly body areas should exist in IT;

-the type of transformations that are learned from visual experience depend on the size of the receptive fields and thus on the area (layer in the models) – assuming that the size increases with layers;

-the mix of transformations learned in each area influences the tuning properties of the cells oriented bars in V1+V2, radial and spiral patterns in V4 up to class specific tuning in AIT (eg face tuned cells);

-features must be discriminative and invariant: invariance to transformations is the primary determinant of the tuning of cortical neurons rather than statistics of natural images.

The theory is broadly consistent with the current version of HMAX. It explains it and extend it in terms of unsupervised learning, a broader class of transformation invariance and higher level modules. The goal of this paper is to sketch a comprehensive theory with little regard for mathematical niceties. If the theory turns out to be useful there will be scope for deep mathematics, ranging from group representation tools to wavelet theory to dynamics of learning
The Computational Magic of the Ventral Stream: Towards a Theory
I conjecture that the sample complexity of object recognition is mostly due to geometric image transformations and that a main goal of the ventral stream – V1, V2, V4 and IT – is to learn-and-discount image transformations. The most surprising implication of the theory emerging from these assumptions is that the computational goals and detailed properties of cells in the ventral stream follow from symmetry properties of the visual world through a process of unsupervised correlational learning.

From the assumption of a hierarchy of areas with receptive fields of increasing size the theory predicts that the size of the receptive fields determines which transformations are learned during development and then factored out during normal processing; that the transformation represented in each area determines the tuning of the neurons in the aerea, independently of the statistics of natural images; and that class-specific transformations are learned and represented at the top of the ventral stream hierarchy.

Some of the main predictions of this theory-in-fieri are:
1. the type of transformation that are learned from visual experience depend on the size (measured in terms of wavelength) and thus on the area (layer in the models) – assuming that the aperture size increases with layers;
2. the mix of transformations learned determine the properties of the receptive fields – oriented bars in V1+V2, radial and spiral patterns in V4 up to class specific tuning in AIT (eg face tuned cells);
3. invariance to small translations in V1 may underly stability of visual perception
4. class-specific modules – such as faces, places and possibly body areas – should exist in IT to process images of object classes
Reading aloud boosts connectivity through the putamen
Functional neuroimaging and lesion studies have frequently reported thalamic and putamen activation during reading and speech production. However, it is currently unknown how activity in these structures interacts with that in other reading and speech production areas. This study investigates how reading aloud modulates the neuronal interactions between visual recognition and articulatory areas, when both the putamen and thalamus are explicitly included. Using dynamic causal modeling in skilled readers who were reading regularly spelled English words, we compared 27 possible pathways that might connect the ventral anterior occipito-temporal sulcus (aOT) to articulatory areas in the precentral cortex (PrC). We focused on whether the neuronal interactions within these pathways were increased by reading relative to picture naming and other visual and articulatory control conditions. The results provide strong evidence that reading boosts the aOTāPrC pathway via the putamen but not the thalamus. However, the putamen pathway was not exclusive because there was also evidence for another reading pathway that did not involve either the putamen or the thalamus. We conclude that the putamen plays a special role in reading but this is likely to vary with individual reading preferences and strategies
Cognitive science and epistemic openness
Recent findings in cognitive science suggest that the epistemic subject is more complex and epistemically porous than is generally pictured. Human knowers are open to the world via multiple channels, each operating for particular purposes and according to its own logic. These findings need to be understood and addressed by the philosophical community. The current essay argues that one consequence of the new findings is to invalidate certain arguments for epistemic anti-realism
Recommended from our members
Macaques preferentially attend to visual patterns with higher fractal dimension contours.
Animals' sensory systems evolved to efficiently process information from their environmental niches. Niches often include irregular shapes and rough textures (e.g., jagged terrain, canopy outlines) that must be navigated to find food, escape predators, and master other fitness-related challenges. For most primates, vision is the dominant sensory modality and thus, primates have evolved systems for processing complicated visual stimuli. One way to quantify information present in visual stimuli in natural scenes is evaluating their fractal dimension. We hypothesized that sensitivity to complicated geometric forms, indexed by fractal dimension, is an evolutionarily conserved capacity, and tested this capacity in rhesus macaques (Macaca mulatta). Monkeys viewed paired black and white images of simulated self-similar contours that systematically varied in fractal dimension while their attention to the stimuli was measured using noninvasive infrared eye tracking. They fixated more frequently on, dwelled for longer durations on, and had attentional biases towards images that contain boundary contours with higher fractal dimensions. This indicates that, like humans, they discriminate between visual stimuli on the basis of fractal dimension and may prefer viewing informationally rich visual stimuli. Our findings suggest that sensitivity to fractal dimension may be a wider ability of the vertebrate vision system
Too little, too late: reduced visual span and speed characterize pure alexia
Whether normal word reading includes a stage of visual processing selectively dedicated to word or letter recognition is highly debated. Characterizing pure alexia, a seemingly selective disorder of reading, has been central to this debate. Two main theories claim either that 1) Pure alexia is caused by damage to a reading specific brain region in the left fusiform gyrus or 2) Pure alexia results from a general visual impairment that may particularly affect simultaneous processing of multiple items. We tested these competing theories in 4 patients with pure alexia using sensitive psychophysical measures and mathematical modeling. Recognition of single letters and digits in the central visual field was impaired in all patients. Visual apprehension span was also reduced for both letters and digits in all patients. The only cortical region lesioned across all 4 patients was the left fusiform gyrus, indicating that this region subserves a function broader than letter or word identification. We suggest that a seemingly pure disorder of reading can arise due to a general reduction of visual speed and span, and explain why this has a disproportionate impact on word reading while recognition of other visual stimuli are less obviously affected
- ā¦