125,063 research outputs found

    Online learning of taskdriven object-based visual attention control

    Get PDF
    A biologically-motivated computational model for learning task-driven and objectbased visual attention control in interactive environments is proposed. Our model consists of three layers. First, in the early visual processing layer, most salient location of a scene is derived using the biased saliency-based bottom-up model of visual attention. Then a cognitive component in the higher visual processing layer performs an application specific operation like object recognition at the focus of attention. From this information, a state is derived in the decision making and learning layer. Online Learning of Task-driven Object-based Visual Attention Control Ali Borji Top-down attention is learned by the U-TREE Discussions and Conclusions An agent working in an environment receives information momentarily through its visual sensor. It should determine what to look for. For this we use RL to teach the agent simply look for the most task relevant and rewarding entity in the visual scene ( This layer controls both top-down visual attention and motor actions. The learning approach is an extension of the U-TREE algorithm [6] to the visual domain. Attention tree is incrementally built in a quasi-static manner in two phases (iterations): 1) RL-fixed phase and 2) Tree-fixed phase In each Tree-fixed phase, RL algorithm is executed for some episodes by Fig. 1. Proposed model for learning task-driven object-based visual attention control Example scenario: captured scene through the agents' visual sensor undergoes a biased bottom-up saliency detection operation and focus of attention (FOA) is determined. Object at the FOA is recognized (i.e. is either present or not in the scene), then the agent moves in its binary tree in the decision making and leaves. 100% correct policy was achieved. The object at the attended location is recognized by the hierarchical model of object recognition (HMAX) [3] M. Riesenhuber, T. Poggio, Hierarchical models of object recognition in cortex. Nature Neuroscience, 2(1999),11, 1019-1025. Basic saliency-based model of visual attention [1] is revised for the purpose of salient region selection (object detection) at this layer where norm(.) is the Euclidean distance between two points in an image. Saliency is the function which takes as input an image and a weight vector and returns the most salient location. t i is the location of target object in the i-th image. In each Tree-fixed phase, RL algorithm is executed for some episodes by following ε-greedy action selection strategy. In this phase, tree is hold fixed and the derived quadruples (s t , a t , r t+1 , s t+1 ) are only used for updating the Q-table: State discretization occurs in the RL-fixed phase where gathered experiences are used to refine aliased states. An object which minimizes aliasing the most is selected for braking an aliased leaf. Acknowledgement This work was funded by the school of cognitive sciences, IPM, Tehran, IRAN. scene), then the agent moves in its binary tree in the decision making and learning layer. This is done repetitively until it reaches a leaf node which determines its state. The best motor action is this state is performed. Outcome of this action over the world is evaluated by a critic and a reinforcement signal is fed back to the agent to update its internal representations (attention tree) and action selection strategy in a quasi-static manner. Following subsections discuss each layer of the model in detail

    Task-set switching with natural scenes: Measuring the cost of deploying top-down attention

    Get PDF
    In many everyday situations, we bias our perception from the top down, based on a task or an agenda. Frequently, this entails shifting attention to a specific attribute of a particular object or scene. To explore the cost of shifting top-down attention to a different stimulus attribute, we adopt the task-set switching paradigm, in which switch trials are contrasted with repeat trials in mixed-task blocks and with single-task blocks. Using two tasks that relate to the content of a natural scene in a gray-level photograph and two tasks that relate to the color of the frame around the image, we were able to distinguish switch costs with and without shifts of attention. We found a significant cost in reaction time of 23–31 ms for switches that require shifting attention to other stimulus attributes, but no significant switch cost for switching the task set within an attribute. We conclude that deploying top-down attention to a different attribute incurs a significant cost in reaction time, but that biasing to a different feature value within the same stimulus attribute is effortless

    The Role of Top-down Attention in Statistical Learning of Speech

    Get PDF
    Statistical learning (SL) refers to the ability to extract regularities in the environment and has been well-documented to play a key role in speech segmentation and language acquisition. Whether SL is automatic or requires top-down attention is an unresolved question, with conflicting results in the literature. The current proposal tests whether SL can occur outside the focus of attention. Participants either focused towards, or diverted their attention away from an auditory speech stream made of repeating nonsense trisyllabic words. Divided-attention participants either performed a concurrent visual task or a language-related task during exposure to the nonsense speech stream, while control participants focused their attention to the speech stream. Visual attention was taxed through the classic Multiple Object Tracking paradigm, requiring tracking of multiple randomly moving dots. Linguistic attention was taxed through a self-paced reading task. Following speech exposure, SL was assessed with offline tests, including a post-exposure explicit familiarity rating task, and an implicit reaction-time (RT) based syllable detection task. On the explicit familiarity rating measure, participants showed a reduction in learning when language-specific processing was taxed as compared to when visual resources were taxed. On the more implicit reaction time-based measure of SL, both divided-attention and full-attention controls performed comparably, all showing evidence of SL. These results suggest SL can proceed even when domain-specific (visual) resources are limited, but is compromised when more specific, language-related resources are taxed. These results offer insight into the neural cognitive underpinnings of SL and have exciting practical applications for improving adult second language acquisition

    The reentry hypothesis: The putative interaction of the frontal eye field, ventrolateral prefrontal cortex, and areas V4, IT for attention and eye movement

    Get PDF
    Attention is known to play a key role in perception, including action selection, object recognition and memory. Despite findings revealing competitive interactions among cell populations, attention remains difficult to explain. The central purpose of this paper is to link up a large number of findings in a single computational approach. Our simulation results suggest that attention can be well explained on a network level involving many areas of the brain. We argue that attention is an emergent phenomenon that arises from reentry and competitive interactions. We hypothesize that guided visual search requires the usage of an object-specific template in prefrontal cortex to sensitize V4 and IT cells whose preferred stimuli match the target template. This induces a feature-specific bias and provides guidance for eye movements. Prior to an eye movement, a spatially organized reentry from occulomotor centers, specifically the movement cells of the frontal eye field, occurs and modulates the gain of V4 and IT cells. The processes involved are elucidated by quantitatively comparing the time course of simulated neural activity with experimental data. Using visual search tasks as an example, we provide clear and empirically testable predictions for the participation of IT, V4 and the frontal eye field in attention. Finally, we explain a possible physiological mechanism that can lead to non-flat search slopes as the result of a slow, parallel discrimination process
    corecore