194 research outputs found

    Inductive learning spatial attention

    Get PDF
    This paper investigates the automatic induction of spatial attention from the visual observation of objects manipulated on a table top. In this work, space is represented in terms of a novel observer-object relative reference system, named Local Cardinal System, defined upon the local neighbourhood of objects on the table. We present results of applying the proposed methodology on five distinct scenarios involving the construction of spatial patterns of coloured blocks

    ARK: Autonomous mobile robot in an industrial environment

    Get PDF
    This paper describes research on the ARK (Autonomous Mobile Robot in a Known Environment) project. The technical objective of the project is to build a robot that can navigate in a complex industrial environment using maps with permanent structures. The environment is not altered in any way by adding easily identifiable beacons and the robot relies on naturally occurring objects to use as visual landmarks for navigation. The robot is equipped with various sensors that can detect unmapped obstacles, landmarks and objects. In this paper we describe the robot's industrial environment, it's architecture, a novel combined range and vision sensor and our recent results in controlling the robot in the real-time detection of objects using their color and in the processing of the robot's range and vision sensor data for navigation

    Biased Competition in Visual Processing Hierarchies: A Learning Approach Using Multiple Cues

    Get PDF
    In this contribution, we present a large-scale hierarchical system for object detection fusing bottom-up (signal-driven) processing results with top-down (model or task-driven) attentional modulation. Specifically, we focus on the question of how the autonomous learning of invariant models can be embedded into a performing system and how such models can be used to define object-specific attentional modulation signals. Our system implements bi-directional data flow in a processing hierarchy. The bottom-up data flow proceeds from a preprocessing level to the hypothesis level where object hypotheses created by exhaustive object detection algorithms are represented in a roughly retinotopic way. A competitive selection mechanism is used to determine the most confident hypotheses, which are used on the system level to train multimodal models that link object identity to invariant hypothesis properties. The top-down data flow originates at the system level, where the trained multimodal models are used to obtain space- and feature-based attentional modulation signals, providing biases for the competitive selection process at the hypothesis level. This results in object-specific hypothesis facilitation/suppression in certain image regions which we show to be applicable to different object detection mechanisms. In order to demonstrate the benefits of this approach, we apply the system to the detection of cars in a variety of challenging traffic videos. Evaluating our approach on a publicly available dataset containing approximately 3,500 annotated video images from more than 1 h of driving, we can show strong increases in performance and generalization when compared to object detection in isolation. Furthermore, we compare our results to a late hypothesis rejection approach, showing that early coupling of top-down and bottom-up information is a favorable approach especially when processing resources are constrained

    Modelling Visual Search with the Selective Attention for Identification Model (VS-SAIM): A Novel Explanation for Visual Search Asymmetries

    Get PDF
    In earlier work, we developed the Selective Attention for Identification Model (SAIM [16]). SAIM models the human ability to perform translation-invariant object identification in multiple object scenes. SAIM suggests that central for this ability is an interaction between parallel competitive processes in a selection stage and a object identification stage. In this paper, we applied the model to visual search experiments involving simple lines and letters. We presented successful simulation results for asymmetric and symmetric searches and for the influence of background line orientations. Search asymmetry refers to changes in search performance when the roles of target item and non-target item (distractor) are swapped. In line with other models of visual search, the results suggest that a large part of the empirical evidence can be explained by competitive processes in the brain, which are modulated by the similarity between target and distractor. The simulations also suggest that another important factor is the feature properties of distractors. Finally, the simulations indicate that search asymmetries can be the outcome of interactions between top-down (knowledge about search items) and bottom-up (feature of search items) processing. This interaction in VS-SAIM is dominated by a novel mechanism, the knowledge-based on-centre-off-surround receptive field. This receptive field is reminiscent of the classical receptive fields but the exact shape is modulated by both, top-down and bottom-up processes. The paper discusses supporting evidence for the existence of this novel concept

    Action Recognition with a Bio--Inspired Feedforward Motion Processing Model: The Richness of Center-Surround Interactions

    Get PDF
    International audienceHere we show that reproducing the functional properties of MT cells with various center--surround interactions enriches motion representation and improves the action recognition performance. To do so, we propose a simplified bio--inspired model of the motion pathway in primates: It is a feedforward model restricted to V1-MT cortical layers, cortical cells cover the visual space with a foveated structure, and more importantly, we reproduce some of the richness of center-surround interactions of MT cells. Interestingly, as observed in neurophysiology, our MT cells not only behave like simple velocity detectors, but also respond to several kinds of motion contrasts. Results show that this diversity of motion representation at the MT level is a major advantage for an action recognition task. Defining motion maps as our feature vectors, we used a standard classification method on the Weizmann database: We obtained an average recognition rate of 98.9%, which is superior to the recent results by Jhuang et al. (2007). These promising results encourage us to further develop bio--inspired models incorporating other brain mechanisms and cortical layers in order to deal with more complex videos

    Dynamic, Task-Related and Demand-Driven Scene Representation

    Get PDF
    Humans selectively process and store details about the vicinity based on their knowledge about the scene, the world and their current task. In doing so, only those pieces of information are extracted from the visual scene that is required for solving a given task. In this paper, we present a flexible system architecture along with a control mechanism that allows for a task-dependent representation of a visual scene. Contrary to existing approaches, our system is able to acquire information selectively according to the demands of the given task and based on the system’s knowledge. The proposed control mechanism decides which properties need to be extracted and how the independent processing modules should be combined, based on the knowledge stored in the system’s long-term memory. Additionally, it ensures that algorithmic dependencies between processing modules are resolved automatically, utilizing procedural knowledge which is also stored in the long-term memory. By evaluating a proof-of-concept implementation on a real-world table scene, we show that, while solving the given task, the amount of data processed and stored by the system is considerably lower compared to processing regimes used in state-of-the-art systems. Furthermore, our system only acquires and stores the minimal set of information that is relevant for solving the given task

    The time course of exogenous and endogenous control of covert attention

    Get PDF
    Studies of eye-movements and manual response have established that rapid overt selection is largely exogenously driven toward salient stimuli, whereas slower selection is largely endogenously driven to relevant objects. We use the N2pc, an event-related potential index of covert attention, to demonstrate that this time course reflects an underlying pattern in the deployment of covert attention. We find that shifts of attention that occur soon after the onset of a visual search array are directed toward salient, task-irrelevant visual stimuli and are associated with slow responses to the target. In contrast, slower shifts are target-directed and are associated with fast responses. The time course of exogenous and endogenous control provides a framework in which some inconsistent results in the capture literature might be reconciled; capture may occur when attention is rapidly deployed

    Explaining efficient search for conjunctions of motion and form: Evidence from negative color effects

    Get PDF
    Dent, Humphreys, and Braithwaite (2011) showed substantial costs to search when a moving target shared its color with a group of ignored static distractors. The present study further explored the conditions under which such costs to performance occur. Experiment 1 tested whether the negative color-sharing effect was specific to cases in which search showed a highly serial pattern. The results showed that the negative color-sharing effect persisted in the case of a target defined as a conjunction of movement and form, even when search was highly efficient. In Experiment 2, the ease with which participants could find an odd-colored target amongst a moving group was examined. Participants searched for a moving target amongst moving and stationary distractors. In Experiment 2A, participants performed a highly serial search through a group of similarly shaped moving letters. Performance was much slower when the target shared its color with a set of ignored static distractors. The exact same displays were used in Experiment 2B; however, participants now responded "present" for targets that shared the color of the static distractors. The same targets that had previously been difficult to find were now found efficiently. The results are interpreted in a flexible framework for attentional control. Targets that are linked with irrelevant distractors by color tend to be ignored. However, this cost can be overridden by top-down control settings. © 2014 Psychonomic Society, Inc
    corecore