45 research outputs found
Computational role of eccentricity dependent cortical magnification
We develop a sampling extension of M-theory focused on invariance to scale
and translation. Quite surprisingly, the theory predicts an architecture of
early vision with increasing receptive field sizes and a high resolution fovea
-- in agreement with data about the cortical magnification factor, V1 and the
retina. From the slope of the inverse of the magnification factor, M-theory
predicts a cortical "fovea" in V1 in the order of by basic units at
each receptive field size -- corresponding to a foveola of size around
minutes of arc at the highest resolution, degrees at the lowest
resolution. It also predicts uniform scale invariance over a fixed range of
scales independently of eccentricity, while translation invariance should
depend linearly on spatial frequency. Bouma's law of crowding follows in the
theory as an effect of cortical area-by-cortical area pooling; the Bouma
constant is the value expected if the signature responsible for recognition in
the crowding experiments originates in V2. From a broader perspective, the
emerging picture suggests that visual recognition under natural conditions
takes place by composing information from a set of fixations, with each
fixation providing recognition from a space-scale image fragment -- that is an
image patch represented at a set of increasing sizes and decreasing
resolutions
How can cells in the anterior medial face patch be viewpoint invariant?
In a recent paper, Freiwald and Tsao (2010) found evidence that the responses of cells in the macaque anterior medial (AM) face patch are invariant to significant changes in viewpoint. The monkey subjects had no prior experience with the individuals depicted in the stimuli and were never given an opportunity to view the same individual from different viewpoints sequentially. These results cannot be explained by a mechanism based on temporal association of experienced views. Employing a biologically plausible model of object recognition (software available at cbcl.mit.edu), we show two mechanisms which could account for these results. First, we show that hair style and skin color provide sufficient information to enable viewpoint recognition without resorting to any mechanism that associates images across views. It is likely that a large part of the effect described in patch AM is attributable to these cues. Separately, we show that it is possible to further improve view-invariance using class-specific features (see Vetter 1997). Faces, as a class, transform under 3D rotation in similar enough ways that it is possible to use previously viewed example faces to learn a general model of how all faces rotate. Novel faces can be encoded relative to these previously encountered “template” faces and thus recognized with some degree of invariance to 3D rotation. Since each object class transforms differently under 3D rotation, it follows that invariant recognition from a single view requires a recognition architecture with a detection step determining the class of an object (e.g. face or non-face) prior to a subsequent identification stage utilizing the appropriate class-specific features
CNS: a GPU-based framework for simulating cortically-organized networks
Computational models whose organization is inspired by the cortex are increasing in both number and popularity. Current instances of such models include convolutional networks, HMAX, Hierarchical Temporal Memory, and deep belief networks. These models present two practical challenges. First, they are computationally intensive. Second, while the operations performed by individual cells, or units, are typically simple, the code needed to keep track of network connectivity can quickly become complicated, leading to programs that are difficult to write and to modify. Massively parallel commodity computing hardware has recently become available in the form of general-purpose GPUs. This helps address the first problem but exacerbates the second. GPU programming adds an extra layer of difficulty, further discouraging exploration. To address these concerns, we have created a programming framework called CNS ('Cortical Network Simulator'). CNS models are automatically compiled and run on a GPU, typically 80-100x faster than on a single CPU, without the user having to learn any GPU programming. A novel scheme for the parametric specification of network connectivity allows the user to focus on writing just the code executed by a single cell. We hope that the ability to rapidly define and run cortically-inspired models will facilitate research in the cortical modeling community. CNS is available under the GNU General Public License
Neurons That Confuse Mirror-Symmetric Object Views
Neurons in inferotemporal cortex that respond similarly to many pairs of mirror-symmetric images -- for example, 45 degree and -45 degree views of the same face -- have often been reported. The phenomenon seemed to be an interesting oddity. However, the same phenomenon has also emerged in simple hierarchical models of the ventral stream. Here we state a theorem characterizing sufficient conditions for this curious invariance to occur in a rather large class of hierarchical networks and demonstrate it with simulations
From primal templates to invariant recognition
We can immediately recognize novel objects seen only once before -- in different positions on the retina and at different scales (distances). Is this ability hardwired by our genes or learned during development -- and if so how? We present a computational proof that developmental learning of invariance in recognition is possible and can emerge rapidly. This computational work sets the stage for experiments on the development of object invariance while suggesting a specific mechanism that may be critically tested
The computational magic of the ventral stream: sketch of a theory (and why some deep architectures work).
This paper explores the theoretical consequences of a simple assumption: the computational goal of the feedforward path in the ventral stream -- from V1, V2, V4 and to IT -- is to discount image transformations, after learning them during development
Does invariant recognition predict tuning of neurons in sensory cortex?
Tuning properties of simple cells in cortical V1 can be described in terms of a "universal shape" characterized by parameter values which hold across different species. This puzzling set of findings begs for a general explanation grounded on an evolutionarily important computational function of the visual cortex. We ask here whether these properties are predicted by the hypothesis that the goal of the ventral stream is to compute for each image a "signature" vector which is invariant to geometric transformations, with the the additional assumption that the mechanism for continuously learning and maintaining invariance consists of the memory storage of a sequence of neural images of a few objects undergoing transformations (such as translation, scale changes and rotation) via Hebbian synapses. For V1 simple cells the simplest version of this hypothesis is the online Oja rule which implies that the tuning of neurons converges to the eigenvectors of the covariance of their input. Starting with a set of dendritic fields spanning a range of sizes, simulations supported by a direct mathematical analysis show that the solution of the associated "cortical equation" provides a set of Gabor-like wavelets with parameter values that are in broad agreement with the physiology data. We show however that the simple version of the Hebbian assumption does not predict all the physiological properties. The same theoretical framework also provides predictions about the tuning of cells in V4 and in the face patch AL which are in qualitative agreement with physiology data
A hierarchical model of peripheral vision
We present a peripheral vision model inspired by the cortical architecture discovered by Hubel and Wiesel. As with existing cortical models, this model contains alternating layers of simple cells, which employ tuning functions to increase specificity, and complex cells, which pool over simple cells to increase invariance. To extend the traditional cortical model, we introduce the option of eccentricity-dependent pooling and tuning parameters within a given model layer. This peripheral vision system can be used to model physiological data where receptive field sizes change as a function of eccentricity. This gives the user flexibility to test different theories about filtering and pooling ranges in the periphery. In a specific instantiation of the model, pooling and tuning parameters can increase linearly with eccentricity to model physiological data found in different layers of the visual cortex. Additionally, it can be used to introduce pre-cortical model layers such as retina and LGN. We have tested the model s response with different parameters on several natural images to demonstrate its effectiveness as a research tool. The peripheral vision model presents a useful tool to test theories about crowding, attention, visual search, and other phenomena of peripheral vision.This work was supported by the following grants: NSF-0640097, NSF-0827427, NSF-0645960, DARPA-DSO, AFSOR FA8650-50-C-7262, AFSOR FA9550-09-1-0606
Unsupervised learning of invariant representations
The present phase of Machine Learning is characterized by supervised learning algorithms relying on large sets of labeled examples (. n\u2192 1e). The next phase is likely to focus on algorithms capable of learning from very few labeled examples (. n\u21921), like humans seem able to do. We propose an approach to this problem and describe the underlying theory, based on the unsupervised, automatic learning of a "good" representation for supervised learning, characterized by small sample complexity. We consider the case of visual object recognition, though the theory also applies to other domains like speech. The starting point is the conjecture, proved in specific cases, that image representations which are invariant to translation, scaling and other transformations can considerably reduce the sample complexity of learning. We prove that an invariant and selective signature can be computed for each image or image patch: the invariance can be exact in the case of group transformations and approximate under non-group transformations. A module performing filtering and pooling, like the simple and complex cells described by Hubel and Wiesel, can compute such signature. The theory offers novel unsupervised learning algorithms for "deep" architectures for image and speech recognition. We conjecture that the main computational goal of the ventral stream of visual cortex is to provide a hierarchical representation of new objects/images which is invariant to transformations, stable, and selective for recognition-and show how this representation may be continuously learned in an unsupervised way during development and visual experienc