36 research outputs found

    Learning Visual Question Answering by Bootstrapping Hard Attention

    Full text link
    Attention mechanisms in biological perception are thought to select subsets of perceptual information for more sophisticated processing which would be prohibitive to perform on all sensory inputs. In computer vision, however, there has been relatively little exploration of hard attention, where some information is selectively ignored, in spite of the success of soft attention, where information is re-weighted and aggregated, but never filtered out. Here, we introduce a new approach for hard attention and find it achieves very competitive performance on a recently-released visual question answering datasets, equalling and in some cases surpassing similar soft attention architectures while entirely ignoring some features. Even though the hard attention mechanism is thought to be non-differentiable, we found that the feature magnitudes correlate with semantic relevance, and provide a useful signal for our mechanism's attentional selection criterion. Because hard attention selects important features of the input information, it can also be more efficient than analogous soft attention mechanisms. This is especially important for recent approaches that use non-local pairwise operations, whereby computational and memory costs are quadratic in the size of the set of features.Comment: ECCV 201

    Adults' Awareness of Faces Follows Newborns' Looking Preferences

    Get PDF
    From the first days of life, humans preferentially orient towards upright faces, likely reflecting innate subcortical mechanisms. Here, we show that binocular rivalry can reveal face detection mechanisms in adults that are surprisingly similar to inborn face detection mechanism. We used continuous flash suppression (CFS), a variant of binocular rivalry, to render stimuli invisible at the beginning of each trial and measured the time upright and inverted stimuli needed to overcome such interocular suppression. Critically, specific stimulus properties previously shown to modulate looking preferences in neonates similarly modulated adults' awareness of faces presented during CFS. First, the advantage of upright faces in overcoming CFS was strongly modulated by contrast polarity and direction of illumination. Second, schematic patterns consisting of three dark blobs were suppressed for shorter durations when the arrangement of these blobs respected the face-like configuration of the eyes and the mouth, and this effect was modulated by contrast polarity. No such effects were obtained in a binocular control experiment not involving CFS, suggesting a crucial role for face-sensitive mechanisms operating outside of conscious awareness. These findings indicate that visual awareness of faces in adults is governed by perceptual mechanisms that are sensitive to similar stimulus properties as those modulating newborns' face preferences

    Bistable Percepts in the Brain: fMRI Contrasts Monocular Pattern Rivalry and Binocular Rivalry

    Get PDF
    The neural correlates of binocular rivalry have been actively debated in recent years, and are of considerable interest as they may shed light on mechanisms of conscious awareness. In a related phenomenon, monocular rivalry, a composite image is shown to both eyes. The subject experiences perceptual alternations in which the two stimulus components alternate in clarity or salience. The experience is similar to perceptual alternations in binocular rivalry, although the reduction in visibility of the suppressed component is greater for binocular rivalry, especially at higher stimulus contrasts. We used fMRI at 3T to image activity in visual cortex while subjects perceived either monocular or binocular rivalry, or a matched non-rivalrous control condition. The stimulus patterns were left/right oblique gratings with the luminance contrast set at 9%, 18% or 36%. Compared to a blank screen, both binocular and monocular rivalry showed a U-shaped function of activation as a function of stimulus contrast, i.e. higher activity for most areas at 9% and 36%. The sites of cortical activation for monocular rivalry included occipital pole (V1, V2, V3), ventral temporal, and superior parietal cortex. The additional areas for binocular rivalry included lateral occipital regions, as well as inferior parietal cortex close to the temporoparietal junction (TPJ). In particular, higher-tier areas MT+ and V3A were more active for binocular than monocular rivalry for all contrasts. In comparison, activation in V2 and V3 was reduced for binocular compared to monocular rivalry at the higher contrasts that evoked stronger binocular perceptual suppression, indicating that the effects of suppression are not limited to interocular suppression in V1

    Incremental grouping of image elements in vision

    Get PDF
    One important task for the visual system is to group image elements that belong to an object and to segregate them from other objects and the background. We here present an incremental grouping theory (IGT) that addresses the role of object-based attention in perceptual grouping at a psychological level and, at the same time, outlines the mechanisms for grouping at the neurophysiological level. The IGT proposes that there are two processes for perceptual grouping. The first process is base grouping and relies on neurons that are tuned to feature conjunctions. Base grouping is fast and occurs in parallel across the visual scene, but not all possible feature conjunctions can be coded as base groupings. If there are no neurons tuned to the relevant feature conjunctions, a second process called incremental grouping comes into play. Incremental grouping is a time-consuming and capacity-limited process that requires the gradual spread of enhanced neuronal activity across the representation of an object in the visual cortex. The spread of enhanced neuronal activity corresponds to the labeling of image elements with object-based attention

    Visual categorization shapes feature selectivity in the primate temporal cortex

    No full text
    The way that we perceive and interact with objects depends on our previous experience with them. For example, a bird expert is more likely to recognize a bird as a sparrow, a sandpiper or a cockatiel than a non-expert. Neurons in the inferior temporal cortex have been shown to be important in the representation of visual objects; however, it is unknown which object features are represented and how these representations are affected by categorization training. Here we show that feature selectivity in the macaque inferior temporal cortex is shaped by categorization of objects on the basis of their visual features. Specifically, we recorded from single neurons while monkeys performed a categorization task with two sets of parametric stimuli. Each stimulus set consisted of four varying features, but only two of the four were important for the categorization task (diagnostic features). We found enhanced neuronal representation of the diagnostic features relative to the non-diagnostic ones. These findings demonstrate that stimulus features important for categorization are instantiated in the activity of single units (neurons) in the primate inferior temporal corte
    corecore