25 research outputs found

    Toward a more biologically plausible model of object recognition

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Physics, 2007.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (leaves 105-113).Rapidly and reliably recognizing an object (is that a cat or a tiger?) is obviously an important skill for survival. However, it is a difficult computational problem, because the same object may appear differently under various conditions, while different objects may share similar features. A robust recognition system must have a capacity to distinguish between similar-looking objects, while being invariant to the appearance-altering transformation of an object. The fundamental challenge for any recognition system lies within this simultaneous requirement for both specificity and invariance. An emerging picture from decades of neuroscience research is that the cortex overcomes this challenge by gradually building up specificity and invariance with a hierarchical architecture. In this thesis, I present a computational model of object recognition with a feedforward and hierarchical architecture. The model quantitatively describes the anatomy, physiology, and the first few hundred milliseconds of visual information processing in the ventral pathway of the primate visual cortex. There are three major contributions. First, the two main operations in the model (Gaussian and maximum) have been cast into a more biologically plausible form, using monotonic nonlinearities and divisive normalization, and a possible canonical neural circuitry has been proposed. Second, shape tuning properties of visual area V4 have been explored using the corresponding layers in the model. It is demonstrated that the observed V4 selectivity for the shapes of intermediate complexity (gratings and contour features) can be explained by the combinations of orientation-selective inputs. Third, shape tuning properties in the higher visual area, inferior temporal (IT) cortex, have also been explored. It is demonstrated that the selectivity and invariance properties of IT neurons can be generated by the feedforward and hierarchical combinations of Gaussian-like and max-like operations, and their responses can support robust object recognition. Furthermore, experimentally-observed clutter effects and trade-off between selectivity and invariance in IT can also be observed and understood in this computational framework.(cont.) These studies show that the model is in good agreements with a number of physiological data and provides insights, at multiple levels, for understanding object recognition process in the cortex.by Minjoon Kouh.Ph.D

    Investigating shape representation in area V4 with HMAX: Orientation and Grating selectivities

    Get PDF
    The question of how shape is represented is of central interest to understanding visual processing in cortex. While tuning properties of the cells in early part of the ventral visual stream, thought to be responsible for object recognition in the primate, are comparatively well understood, several different theories have been proposed regarding tuning in higher visual areas, such as V4. We used the model of object recognition in cortex presented by Riesenhuber and Poggio (1999), where more complex shape tuning in higher layers is the result of combining afferent inputs tuned to simpler features, and compared the tuning properties of model units in intermediate layers to those of V4 neurons from the literature. In particular, we investigated the issue of shape representation in visual area V1 and V4 using oriented bars and various types of gratings (polar, hyperbolic, and Cartesian), as used in several physiology experiments. Our computational model was able to reproduce several physiological findings, such as the broadening distribution of the orientation bandwidths and the emergence of a bias toward non-Cartesian stimuli. Interestingly, the simulation results suggest that some V4 neurons receive input from afferents with spatially separated receptive fields, leading to experimentally testable predictions. However, the simulations also show that the stimulus set of Cartesian and non-Cartesian gratings is not sufficiently complex to probe shape tuning in higher areas, necessitating the use of more complex stimulus sets

    The interference effect of concurrent working memory task on visual inhibitory control

    Get PDF
    We examined the interference between inhibitory control of a saccadic eye movement and a working memory task. This study was motivated by the observation that people are suscep-tible to cognitive errors when they are preoccupied. Subjects were instructed to make an anti-saccade, or to look in the opposite direction of a visual stimulus, thereby exercising inhibito-ry control over the reflexive eye movement towards a salient object. At the same time, the subjects were instructed to memorize a random sequence of digits that were read out to them, thereby engaging their working memory. We measured the success of an eye movement by rapidly switching between images and asking the subjects what they saw. We found that these concurrent cognitive tasks significantly degraded anti-saccade performance.We examined the interference between inhibitory control of a saccadic eye movement and a working memory task. This study was motivated by the observation that people are susceptible to cognitive errors when they are preoccupied. Subjects were instructed to make an anti-saccade, or to look in the opposite direction of a visual stimulus, thereby exercising inhibitory control over the reflexive eye movement towards a salient object. At the same time, the subjects were instructed to memorize a random sequence of digits that were read out to them, thereby engaging their working memory. We measured the success of an eye movement by rapidly switching between images and asking the subjects what they saw. We found that these concurrent cognitive tasks significantly degraded anti-saccade performance

    Incorporation of prior knowledge and habits while solving anagrams

    Get PDF
    Games and puzzles provide a valuable context for examining human problem- solving behavior. We recorded and analyzed the sequence of letters viewed by the participants of our study while they were solving anagram puzzles. The goal was to examine and understand how people's linguistic habits and prior knowledge influenced their eye movements. The main findings of this study are: (1) People's stereotypical habit of scanning (e.g., adjacent or top viewing) strongly influences their solution-seeking behavior. (2) People tend to incorpo- rate their prior knowledge of letter statistics in a reasonable way, such as looking less frequently at letter combinations that are uncommon in the English lan- guage. Therefore, it was found that people’s prior

    Phonetic Classification Using Hierarchical, Feed-forward, Spectro-temporal Patch-based Architectures

    Get PDF
    A preliminary set of experiments are described in which a biologically-inspired computer vision system (Serre, Wolf et al. 2005; Serre 2006; Serre, Oliva et al. 2006; Serre, Wolf et al. 2006) designed for visual object recognition was applied to the task of phonetic classification. During learning, the systemprocessed 2-D wideband magnitude spectrograms directly as images, producing a set of 2-D spectrotemporal patch dictionaries at different spectro-temporal positions, orientations, scales, and of varying complexity. During testing, features were computed by comparing the stored patches with patches fromnovel spectrograms. Classification was performed using a regularized least squares classifier (Rifkin, Yeo et al. 2003; Rifkin, Schutte et al. 2007) trained on the features computed by the system. On a 20-class TIMIT vowel classification task, the model features achieved a best result of 58.74% error, compared to 48.57% error using state-of-the-art MFCC-based features trained using the same classifier. This suggests that hierarchical, feed-forward, spectro-temporal patch-based architectures may be useful for phoneticanalysis

    A general mechanism for tuning: Gain control circuits and synapses underlie tuning of cortical neurons

    No full text
    Tuning to an optimal stimulus is a widespread property of neurons in cortex. We propose that such tuning is a consequence of normalization or gain control circuits. We also present a biologically plausible neural circuitry of tuning
    corecore