8 research outputs found

    Predictions of a model of spatial attention using sum-and max-pooling functions

    Get PDF
    Assuming a convergent projection within a hierarchy of processing stages stimuli from different areas of the receptive ,eld project onto the same population of cells. Pooling over space a-ects the representation of individual stimuli, and thus its understanding is crucial for attention and ultimately for object recognition. Since attention, in turn, is likely to modify such spatial pooling by changing the competitive weight of individual stimuli, we compare the predictions of sum- and max-pooling methods using a model of attention. Both pooling functions can account for data investigating the competition between a pair of stimuli within a V4 receptive ,eld; however, our model using sum-pooling predicts a di-erent tuning curve. If we present an additional probe stimulus with the pair, sum-pooling predicts a bottom-up bias of attention, whereas the competition for attention using max-pooling is robust against the additional stimulus

    The reentry hypothesis: linking eye movements to visual perception

    Get PDF
    Cortical organization of vision appears to be divided into perception and action. Models of vision have generally assumed that eye movements serve to select a scene for perception, so action and perception are sequential processes. We suggest a less distinct separation. According to our model, occulomotor areas responsible for planning an eye movement, such as the frontal eye field, influence perception prior to the eye movement. The activity reflecting the planning of an eye movement reenters the ventral pathway and sensitizes all cells within the movement field so the planned action determines perception. We demonstrate the performance of the computational model in a visual search task that demands an eye movement toward a target

    Union-net: A deep neural network model adapted to small data sets

    Full text link
    In real applications, generally small data sets can be obtained. At present, most of the practical applications of machine learning use classic models based on big data to solve the problem of small data sets. However, the deep neural network model has complex structure, huge model parameters, and training requires more advanced equipment, which brings certain difficulties to the application. Therefore, this paper proposes the concept of union convolution, designing a light deep network model union-net with a shallow network structure and adapting to small data sets. This model combines convolutional network units with different combinations of the same input to form a union module. Each union module is equivalent to a convolutional layer. The serial input and output between the 3 modules constitute a "3-layer" neural network. The output of each union module is fused and added as the input of the last convolutional layer to form a complex network with a 4-layer network structure. It solves the problem that the deep network model network is too deep and the transmission path is too long, which causes the loss of the underlying information transmission. Because the model has fewer model parameters and fewer channels, it can better adapt to small data sets. It solves the problem that the deep network model is prone to overfitting in training small data sets. Use the public data sets cifar10 and 17flowers to conduct multi-classification experiments. Experiments show that the Union-net model can perform well in classification of large data sets and small data sets. It has high practical value in daily application scenarios. The model code is published at https://github.com/yeaso/union-netComment: 13 pages, 6 figure

    The reentry hypothesis: The putative interaction of the frontal eye field, ventrolateral prefrontal cortex, and areas V4, IT for attention and eye movement

    Get PDF
    Attention is known to play a key role in perception, including action selection, object recognition and memory. Despite findings revealing competitive interactions among cell populations, attention remains difficult to explain. The central purpose of this paper is to link up a large number of findings in a single computational approach. Our simulation results suggest that attention can be well explained on a network level involving many areas of the brain. We argue that attention is an emergent phenomenon that arises from reentry and competitive interactions. We hypothesize that guided visual search requires the usage of an object-specific template in prefrontal cortex to sensitize V4 and IT cells whose preferred stimuli match the target template. This induces a feature-specific bias and provides guidance for eye movements. Prior to an eye movement, a spatially organized reentry from occulomotor centers, specifically the movement cells of the frontal eye field, occurs and modulates the gain of V4 and IT cells. The processes involved are elucidated by quantitatively comparing the time course of simulated neural activity with experimental data. Using visual search tasks as an example, we provide clear and empirically testable predictions for the participation of IT, V4 and the frontal eye field in attention. Finally, we explain a possible physiological mechanism that can lead to non-flat search slopes as the result of a slow, parallel discrimination process

    The Peri-Saccadic Perception of Objects and Space

    Get PDF
    Eye movements affect object localization and object recognition. Around saccade onset, briefly flashed stimuli appear compressed towards the saccade target, receptive fields dynamically change position, and the recognition of objects near the saccade target is improved. These effects have been attributed to different mechanisms. We provide a unifying account of peri-saccadic perception explaining all three phenomena by a quantitative computational approach simulating cortical cell responses on the population level. Contrary to the common view of spatial attention as a spotlight, our model suggests that oculomotor feedback alters the receptive field structure in multiple visual areas at an intermediate level of the cortical hierarchy to dynamically recruit cells for processing a relevant part of the visual field. The compression of visual space occurs at the expense of this locally enhanced processing capacity

    Large-scale interactive exploratory visual search

    Get PDF
    Large scale visual search has been one of the challenging issues in the era of big data. It demands techniques that are not only highly effective and efficient but also allow users conveniently express their information needs and refine their intents. In this thesis, we focus on developing an exploratory framework for large scale visual search. We also develop a number of enabling techniques in this thesis, including compact visual content representation for scalable search, near duplicate video shot detection, and action based event detection. We propose a novel scheme for extremely low bit rate visual search, which sends compressed visual words consisting of vocabulary tree histogram and descriptor orientations rather than descriptors. Compact representation of video data is achieved through identifying keyframes of a video which can also help users comprehend visual content efficiently. We propose a novel Bag-of-Importance model for static video summarization. Near duplicate detection is one of the key issues for large scale visual search, since there exist a large number nearly identical images and videos. We propose an improved near-duplicate video shot detection approach for more effective shot representation. Event detection has been one of the solutions for bridging the semantic gap in visual search. We particular focus on human action centred event detection. We propose an enhanced sparse coding scheme to model human actions. Our proposed approach is able to significantly reduce computational cost while achieving recognition accuracy highly comparable to the state-of-the-art methods. At last, we propose an integrated solution for addressing the prime challenges raised from large-scale interactive visual search. The proposed system is also one of the first attempts for exploratory visual search. It provides users more robust results to satisfy their exploring experiences

    Perception of biological motion by form analysis

    Get PDF
    Detection of other living beings’ movements is a fundamental property of the human visual system. Viewing their movements, categorizing their actions, and interpreting social behaviors like gestures constitutes a framework of our everyday lives. These observed actions are complex and differences among them are rather subtle. However, humans recognize these actions without ma jor efforts and without being aware of the complexity of the observed tasks. In point-light walkers, the visual information about the human body is reduced to only a handful point-lights placed on the ma jor joints of the otherwise invisible body. But even this sparse information does not effectively reduce humans’ abilities to perceive the performed actions. Neurophysiological and neuroimaging studies suggested that the movement of the human body is represented in specific brain areas. Nonetheless, the underlying network is still issue of controversial discussion. To investigate the role of form information, I developed a model and conducted psychophysical experiments using point-light walkers. A widely accepted theory claims that in point-light walkers, form information is decreased to a non-usable minimum and, thus, the perception of biological motion is driven by the analysis of motion signals. In my study, I could show that point-light walker indeed contain useful form information. Moreover, I could show that temporal integration of this information is sufficient to explain results from psychophysical, neurophysiological, and neuroimaging studies. In opposition to the standard models of biological motion perception, I could also show that all results can be explained without the analysis of local motion signals

    Visual attention in primates and for machines - neuronal mechanisms

    Get PDF
    Visual attention is an important cognitive concept for the daily life of humans, but still not fully understood. Due to this, it is also rarely utilized in computer vision systems. However, understanding visual attention is challenging as it has many and seemingly-different aspects, both at neuronal and behavioral level. Thus, it is very hard to give a uniform explanation of visual attention that can account for all aspects. To tackle this problem, this thesis has the goal to identify a common set of neuronal mechanisms, which underlie both neuronal and behavioral aspects. The mechanisms are simulated by neuro-computational models, thus, resulting in a single modeling approach to explain a wide range of phenomena at once. In the thesis, the chosen aspects are multiple neurophysiological effects, real-world object localization, and a visual masking paradigm (OSM). In each of the considered fields, the work also advances the current state-of-the-art to better understand this aspect of attention itself. The three chosen aspects highlight that the approach can account for crucial neurophysiological, functional, and behavioral properties, thus the mechanisms might constitute the general neuronal substrate of visual attention in the cortex. As outlook, our work provides for computer vision a deeper understanding and a concrete prototype of attention to incorporate this crucial aspect of human perception in future systems.:1. General introduction 2. The state-of-the-art in modeling visual attention 3. Microcircuit model of attention 4. Object localization with a model of visual attention 5. Object substitution masking 6. General conclusionVisuelle Aufmerksamkeit ist ein wichtiges kognitives Konzept für das tägliche Leben des Menschen. Es ist aber immer noch nicht komplett verstanden, so dass es ein langjähriges Ziel der Neurowissenschaften ist, das Phänomen grundlegend zu durchdringen. Gleichzeitig wird es aufgrund des mangelnden Verständnisses nur selten in maschinellen Sehsystemen in der Informatik eingesetzt. Das Verständnis von visueller Aufmerksamkeit ist jedoch eine komplexe Herausforderung, da Aufmerksamkeit äußerst vielfältige und scheinbar unterschiedliche Aspekte besitzt. Sie verändert multipel sowohl die neuronalen Feuerraten als auch das menschliche Verhalten. Daher ist es sehr schwierig, eine einheitliche Erklärung von visueller Aufmerksamkeit zu finden, welche für alle Aspekte gleichermaßen gilt. Um dieses Problem anzugehen, hat diese Arbeit das Ziel, einen gemeinsamen Satz neuronaler Mechanismen zu identifizieren, welche sowohl den neuronalen als auch den verhaltenstechnischen Aspekten zugrunde liegen. Die Mechanismen werden in neuro-computationalen Modellen simuliert, wodurch ein einzelnes Modellierungsframework entsteht, welches zum ersten Mal viele und verschiedenste Phänomene von visueller Aufmerksamkeit auf einmal erklären kann. Als Aspekte wurden in dieser Dissertation multiple neurophysiologische Effekte, Realwelt Objektlokalisation und ein visuelles Maskierungsparadigma (OSM) gewählt. In jedem dieser betrachteten Felder wird gleichzeitig der State-of-the-Art verbessert, um auch diesen Teilbereich von Aufmerksamkeit selbst besser zu verstehen. Die drei gewählten Gebiete zeigen, dass der Ansatz grundlegende neurophysiologische, funktionale und verhaltensbezogene Eigenschaften von visueller Aufmerksamkeit erklären kann. Da die gefundenen Mechanismen somit ausreichend sind, das Phänomen so umfassend zu erklären, könnten die Mechanismen vielleicht sogar das essentielle neuronale Substrat von visueller Aufmerksamkeit im Cortex darstellen. Für die Informatik stellt die Arbeit damit ein tiefergehendes Verständnis von visueller Aufmerksamkeit dar. Darüber hinaus liefert das Framework mit seinen neuronalen Mechanismen sogar eine Referenzimplementierung um Aufmerksamkeit in zukünftige Systeme integrieren zu können. Aufmerksamkeit könnte laut der vorliegenden Forschung sehr nützlich für diese sein, da es im Gehirn eine Aufgabenspezifische Optimierung des visuellen Systems bereitstellt. Dieser Aspekt menschlicher Wahrnehmung fehlt meist in den aktuellen, starken Computervisionssystemen, so dass eine Integration in aktuelle Systeme deren Leistung sprunghaft erhöhen und eine neue Klasse definieren dürfte.:1. General introduction 2. The state-of-the-art in modeling visual attention 3. Microcircuit model of attention 4. Object localization with a model of visual attention 5. Object substitution masking 6. General conclusio
    corecore