7,390 research outputs found

    On the correlation between human fixations, handcrafted and CNN features

    Get PDF
    AbstractTraditional local image descriptors such as SIFT and SURF are based on processings similar to those that take place in the early visual cortex. Nowadays, convolutional neural networks still draw inspiration from the human vision system, integrating computational elements typical of higher visual cortical areas. Deep CNN's architectures are intrinsically hard to interpret, so much effort has been made to dissect them in order to understand which type of features they learn. However, considering the resemblance to the human vision system, no enough attention has been devoted to understand if the image features learned by deep CNNs and used for classification correlate with features that humans select when viewing images, the so-called human fixations, nor if they correlate with earlier developed handcrafted features such as SIFT and SURF. Exploring these correlations is highly meaningful since what we require from CNNs, and features in general, is to recognize and correctly classify objects or subjects relevant to humans. In this paper, we establish the correlation between three families of image interest points: human fixations, handcrafted and CNN features. We extract features from the feature maps of selected layers of several deep CNN's architectures, from the shallowest to the deepest. All features and fixations are then compared with two types of measures, global and local, which unveil the degree of similarity of the areas of interest of the three families. From the experiments carried out on ETD human fixations database, it turns out that human fixations are positively correlated with handcrafted features and even more with deep layers of CNNs and that handcrafted features highly correlate between themselves as some CNNs do

    Objects predict fixations better than early saliency

    Get PDF
    Humans move their eyes while looking at scenes and pictures. Eye movements correlate with shifts in attention and are thought to be a consequence of optimal resource allocation for high-level tasks such as visual recognition. Models of attention, such as “saliency maps,” are often built on the assumption that “early” features (color, contrast, orientation, motion, and so forth) drive attention directly. We explore an alternative hypothesis: Observers attend to “interesting” objects. To test this hypothesis, we measure the eye position of human observers while they inspect photographs of common natural scenes. Our observers perform different tasks: artistic evaluation, analysis of content, and search. Immediately after each presentation, our observers are asked to name objects they saw. Weighted with recall frequency, these objects predict fixations in individual images better than early saliency, irrespective of task. Also, saliency combined with object positions predicts which objects are frequently named. This suggests that early saliency has only an indirect effect on attention, acting through recognized objects. Consequently, rather than treating attention as mere preprocessing step for object recognition, models of both need to be integrated

    Drawing cartoon faces - a functional imaging study of the cognitive neuroscience of drawing

    Full text link
    We report a functional imaging study of drawing cartoon faces. Normal, untrained participants were scanned while viewing simple black and white cartoon line-drawings of human faces, retaining them for a short memory interval, and then drawing them without vision of their hand or the paper. Specific encoding and retention of information about the faces was tested for by contrasting these two stages (with display of cartoon faces) against the exploration and retention of random dot stimuli. Drawing was contrasted between conditions in which only memory of a previously viewed face was available versus a condition in which both memory and simultaneous viewing of the cartoon was possible, and versus drawing of a new, previously unseen, face. We show that the encoding of cartoon faces powerfully activates the face sensitive areas of the lateral occipital cortex and the fusiform gyrus, but there is no significant activation in these areas during the retention interval. Activity in both areas was also high when drawing the displayed cartoons. Drawing from memory activates areas in posterior parietal cortex and frontal areas. This activity is consistent with the encoding and retention of the spatial information about the face to be drawn as a visuo-motor action plan, either representing a series of targets for ocular fixation or as spatial targets for the drawing actio

    GazeDPM: Early Integration of Gaze Information in Deformable Part Models

    Full text link
    An increasing number of works explore collaborative human-computer systems in which human gaze is used to enhance computer vision systems. For object detection these efforts were so far restricted to late integration approaches that have inherent limitations, such as increased precision without increase in recall. We propose an early integration approach in a deformable part model, which constitutes a joint formulation over gaze and visual data. We show that our GazeDPM method improves over the state-of-the-art DPM baseline by 4% and a recent method for gaze-supported object detection by 3% on the public POET dataset. Our approach additionally provides introspection of the learnt models, can reveal salient image structures, and allows us to investigate the interplay between gaze attracting and repelling areas, the importance of view-specific models, as well as viewers' personal biases in gaze patterns. We finally study important practical aspects of our approach, such as the impact of using saliency maps instead of real fixations, the impact of the number of fixations, as well as robustness to gaze estimation error
    corecore