64 research outputs found

    Salience-based object prioritization during active viewing of naturalistic scenes in young and older adults

    Get PDF
    Whether fixation selection in real-world scenes is guided by image salience or by objects has been a matter of scientific debate. To contrast the two views, we compared effects of location-based and object-based visual salience in young and older (65 + years) adults. Generalized linear mixed models were used to assess the unique contribution of salience to fixation selection in scenes. When analysing fixation guidance without recurrence to objects, visual salience predicted whether image patches were fixated or not. This effect was reduced for the elderly, replicating an earlier finding. When using objects as the unit of analysis, we found that highly salient objects were more frequently selected for fixation than objects with low visual salience. Interestingly, this effect was larger for older adults. We also analysed where viewers fixate within objects, once they are selected. A preferred viewing location close to the centre of the object was found for both age groups. The results support the view that objects are important units of saccadic selection. Reconciling the salience view with the object view, we suggest that visual salience contributes to prioritization among objects. Moreover, the data point towards an increasing relevance of object-bound information with increasing age

    PersonRank: Detecting Important People in Images

    Full text link
    Always, some individuals in images are more important/attractive than others in some events such as presentation, basketball game or speech. However, it is challenging to find important people among all individuals in images directly based on their spatial or appearance information due to the existence of diverse variations of pose, action, appearance of persons and various changes of occasions. We overcome this difficulty by constructing a multiple Hyper-Interaction Graph to treat each individual in an image as a node and inferring the most active node referring to interactions estimated by various types of clews. We model pairwise interactions between persons as the edge message communicated between nodes, resulting in a bidirectional pairwise-interaction graph. To enrich the personperson interaction estimation, we further introduce a unidirectional hyper-interaction graph that models the consensus of interaction between a focal person and any person in a local region around. Finally, we modify the PageRank algorithm to infer the activeness of persons on the multiple Hybrid-Interaction Graph (HIG), the union of the pairwise-interaction and hyperinteraction graphs, and we call our algorithm the PersonRank. In order to provide publicable datasets for evaluation, we have contributed a new dataset called Multi-scene Important People Image Dataset and gathered a NCAA Basketball Image Dataset from sports game sequences. We have demonstrated that the proposed PersonRank outperforms related methods clearly and substantially.Comment: 8 pages, conferenc

    Influence of Image Classification Accuracy on Saliency Map Estimation

    Full text link
    Saliency map estimation in computer vision aims to estimate the locations where people gaze in images. Since people tend to look at objects in images, the parameters of the model pretrained on ImageNet for image classification are useful for the saliency map estimation. However, there is no research on the relationship between the image classification accuracy and the performance of the saliency map estimation. In this paper, it is shown that there is a strong correlation between image classification accuracy and saliency map estimation accuracy. We also investigated the effective architecture based on multi scale images and the upsampling layers to refine the saliency-map resolution. Our model achieved the state-of-the-art accuracy on the PASCAL-S, OSIE, and MIT1003 datasets. In the MIT Saliency Benchmark, our model achieved the best performance in some metrics and competitive results in the other metrics.Comment: CAAI Transactions on Intelligence Technology, accepted in 201

    Predicting rhesus monkey eye movements during natural- image search

    Get PDF
    There are three prominent factors that can predict human visual-search behavior in natural scenes: the distinctiveness of a location (salience), similarity to the target (relevance), and features of the environment that predict where the object might be (context). We do not currently know how well these factors are able to predict macaque visual search, which matters because it is arguably the most popular model for asking how the brain controls eye movements. Here we trained monkeys to perform the pedestrian search task previously used for human subjects. Salience, relevance, and context models were all predictive of monkey eye fixations and jointly about as precise as for humans. We attempted to disrupt the influence of scene context on search by testing the monkeys with an inverted set of the same images. Surprisingly, the monkeys were able to locate the pedestrian at a rate similar to that for upright images. The best predictions of monkey fixations in searching inverted images were obtained by rotating the results of the model predictions for the original image. The fact that the same models can predict human and monkey search behavior suggests that the monkey can be used as a good model for understanding how the human brain enables natural-scene search

    Knowledge-driven perceptual organization reshapes information sampling via eye movements

    Get PDF
    Humans constantly move their eyes to explore the environment. However, how image-computable features and object representations contribute to eye-movement control is an ongoing debate. Recent developments in object perception indicate a complex relationship between features and object representations, where image-independent object knowledge generates objecthood by reconfiguring how feature space is carved up. Here, we adopt this emerging perspective, asking whether object-oriented eye movements result from gaze being guided by image-computable features, or by the fact that these features are bound into an object representation. We recorded eye movements in response to stimuli that initially appear as meaningless patches but are experienced as coherent objects once relevant object knowledge has been acquired. We demonstrate that fixations on identical images are more object-centered, less dispersed, and more consistent across observers once these images are organized into objects. Gaze guidance also showed a shift from exploratory information sampling to exploitation of object-related image areas. These effects were evident from the first fixations onwards. Importantly, eye movements were not fully determined by knowledge-dependent object representations but were best explained by the integration of these representations with image-computable features. Overall, the results show how information sampling via eye movements is guided by a dynamic interaction between image-computable features and knowledge-driven perceptual organization
    • …
    corecore