6,429 research outputs found
Texture Segregation By Visual Cortex: Perceptual Grouping, Attention, and Learning
A neural model is proposed of how laminar interactions in the visual cortex may learn and recognize object texture and form boundaries. The model brings together five interacting processes: region-based texture classification, contour-based boundary grouping, surface filling-in, spatial attention, and object attention. The model shows how form boundaries can determine regions in which surface filling-in occurs; how surface filling-in interacts with spatial attention to generate a form-fitting distribution of spatial attention, or attentional shroud; how the strongest shroud can inhibit weaker shrouds; and how the winning shroud regulates learning of texture categories, and thus the allocation of object attention. The model can discriminate abutted textures with blurred boundaries and is sensitive to texture boundary attributes like discontinuities in orientation and texture flow curvature as well as to relative orientations of texture elements. The model quantitatively fits a large set of human psychophysical data on orientation-based textures. Object boundar output of the model is compared to computer vision algorithms using a set of human segmented photographic images. The model classifies textures and suppresses noise using a multiple scale oriented filterbank and a distributed Adaptive Resonance Theory (dART) classifier. The matched signal between the bottom-up texture inputs and top-down learned texture categories is utilized by oriented competitive and cooperative grouping processes to generate texture boundaries that control surface filling-in and spatial attention. Topdown modulatory attentional feedback from boundary and surface representations to early filtering stages results in enhanced texture boundaries and more efficient learning of texture within attended surface regions. Surface-based attention also provides a self-supervising training signal for learning new textures. Importance of the surface-based attentional feedback in texture learning and classification is tested using a set of textured images from the Brodatz micro-texture album. Benchmark studies vary from 95.1% to 98.6% with attention, and from 90.6% to 93.2% without attention.Air Force Office of Scientific Research (F49620-01-1-0397, F49620-01-1-0423); National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624
AudioāVisual Speaker Tracking
Target motion tracking found its application in interdisciplinary fields, including but not limited to surveillance and security, forensic science, intelligent transportation system, driving assistance, monitoring prohibited area, medical science, robotics, action and expression recognition, individual speaker discrimination in multiāspeaker environments and video conferencing in the fields of computer vision and signal processing. Among these applications, speaker tracking in enclosed spaces has been gaining relevance due to the widespread advances of devices and technologies and the necessity for seamless solutions in realātime tracking and localization of speakers. However, speaker tracking is a challenging task in realālife scenarios as several distinctive issues influence the tracking process, such as occlusions and an unknown number of speakers. One approach to overcome these issues is to use multiāmodal information, as it conveys complementary information about the state of the speakers compared to singleāmodal tracking. To use multiāmodal information, several approaches have been proposed which can be classified into two categories, namely deterministic and stochastic. This chapter aims at providing multimedia researchers with a stateāofātheāart overview of tracking methods, which are used for combining multiple modalities to accomplish various multimedia analysis tasks, classifying them into different categories and listing new and future trends in this field
Nilpotent Approximations of Sub-Riemannian Distances for Fast Perceptual Grouping of Blood Vessels in 2D and 3D
We propose an efficient approach for the grouping of local orientations
(points on vessels) via nilpotent approximations of sub-Riemannian distances in
the 2D and 3D roto-translation groups and . In our distance
approximations we consider homogeneous norms on nilpotent groups that locally
approximate , and which are obtained via the exponential and logarithmic
map on . In a qualitative validation we show that the norms provide
accurate approximations of the true sub-Riemannian distances, and we discuss
their relations to the fundamental solution of the sub-Laplacian on .
The quantitative experiments further confirm the accuracy of the
approximations. Quantitative results are obtained by evaluating perceptual
grouping performance of retinal blood vessels in 2D images and curves in
challenging 3D synthetic volumes. The results show that 1) sub-Riemannian
geometry is essential in achieving top performance and 2) that grouping via the
fast analytic approximations performs almost equally, or better, than
data-adaptive fast marching approaches on and .Comment: 18 pages, 9 figures, 3 tables, in review at JMI
- ā¦