6,429 research outputs found

    Texture Segregation By Visual Cortex: Perceptual Grouping, Attention, and Learning

    Get PDF
    A neural model is proposed of how laminar interactions in the visual cortex may learn and recognize object texture and form boundaries. The model brings together five interacting processes: region-based texture classification, contour-based boundary grouping, surface filling-in, spatial attention, and object attention. The model shows how form boundaries can determine regions in which surface filling-in occurs; how surface filling-in interacts with spatial attention to generate a form-fitting distribution of spatial attention, or attentional shroud; how the strongest shroud can inhibit weaker shrouds; and how the winning shroud regulates learning of texture categories, and thus the allocation of object attention. The model can discriminate abutted textures with blurred boundaries and is sensitive to texture boundary attributes like discontinuities in orientation and texture flow curvature as well as to relative orientations of texture elements. The model quantitatively fits a large set of human psychophysical data on orientation-based textures. Object boundar output of the model is compared to computer vision algorithms using a set of human segmented photographic images. The model classifies textures and suppresses noise using a multiple scale oriented filterbank and a distributed Adaptive Resonance Theory (dART) classifier. The matched signal between the bottom-up texture inputs and top-down learned texture categories is utilized by oriented competitive and cooperative grouping processes to generate texture boundaries that control surface filling-in and spatial attention. Topdown modulatory attentional feedback from boundary and surface representations to early filtering stages results in enhanced texture boundaries and more efficient learning of texture within attended surface regions. Surface-based attention also provides a self-supervising training signal for learning new textures. Importance of the surface-based attentional feedback in texture learning and classification is tested using a set of textured images from the Brodatz micro-texture album. Benchmark studies vary from 95.1% to 98.6% with attention, and from 90.6% to 93.2% without attention.Air Force Office of Scientific Research (F49620-01-1-0397, F49620-01-1-0423); National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624

    Audioā€Visual Speaker Tracking

    Get PDF
    Target motion tracking found its application in interdisciplinary fields, including but not limited to surveillance and security, forensic science, intelligent transportation system, driving assistance, monitoring prohibited area, medical science, robotics, action and expression recognition, individual speaker discrimination in multiā€speaker environments and video conferencing in the fields of computer vision and signal processing. Among these applications, speaker tracking in enclosed spaces has been gaining relevance due to the widespread advances of devices and technologies and the necessity for seamless solutions in realā€time tracking and localization of speakers. However, speaker tracking is a challenging task in realā€life scenarios as several distinctive issues influence the tracking process, such as occlusions and an unknown number of speakers. One approach to overcome these issues is to use multiā€modal information, as it conveys complementary information about the state of the speakers compared to singleā€modal tracking. To use multiā€modal information, several approaches have been proposed which can be classified into two categories, namely deterministic and stochastic. This chapter aims at providing multimedia researchers with a stateā€ofā€theā€art overview of tracking methods, which are used for combining multiple modalities to accomplish various multimedia analysis tasks, classifying them into different categories and listing new and future trends in this field

    Nilpotent Approximations of Sub-Riemannian Distances for Fast Perceptual Grouping of Blood Vessels in 2D and 3D

    Get PDF
    We propose an efficient approach for the grouping of local orientations (points on vessels) via nilpotent approximations of sub-Riemannian distances in the 2D and 3D roto-translation groups SE(2)SE(2) and SE(3)SE(3). In our distance approximations we consider homogeneous norms on nilpotent groups that locally approximate SE(n)SE(n), and which are obtained via the exponential and logarithmic map on SE(n)SE(n). In a qualitative validation we show that the norms provide accurate approximations of the true sub-Riemannian distances, and we discuss their relations to the fundamental solution of the sub-Laplacian on SE(n)SE(n). The quantitative experiments further confirm the accuracy of the approximations. Quantitative results are obtained by evaluating perceptual grouping performance of retinal blood vessels in 2D images and curves in challenging 3D synthetic volumes. The results show that 1) sub-Riemannian geometry is essential in achieving top performance and 2) that grouping via the fast analytic approximations performs almost equally, or better, than data-adaptive fast marching approaches on Rn\mathbb{R}^n and SE(n)SE(n).Comment: 18 pages, 9 figures, 3 tables, in review at JMI
    • ā€¦
    corecore