64 research outputs found
Salience-based object prioritization during active viewing of naturalistic scenes in young and older adults
Whether fixation selection in real-world scenes is guided by image salience or by objects has been a matter of scientific debate. To contrast the two views, we compared effects of location-based and object-based visual salience in young and older (65 + years) adults. Generalized linear mixed models were used to assess the unique contribution of salience to fixation selection in scenes. When analysing fixation guidance without recurrence to objects, visual salience predicted whether image patches were fixated or not. This effect was reduced for the elderly, replicating an earlier finding. When using objects as the unit of analysis, we found that highly salient objects were more frequently selected for fixation than objects with low visual salience. Interestingly, this effect was larger for older adults. We also analysed where viewers fixate within objects, once they are selected. A preferred viewing location close to the centre of the object was found for both age groups. The results support the view that objects are important units of saccadic selection. Reconciling the salience view with the object view, we suggest that visual salience contributes to prioritization among objects. Moreover, the data point towards an increasing relevance of object-bound information with increasing age
PersonRank: Detecting Important People in Images
Always, some individuals in images are more important/attractive than others
in some events such as presentation, basketball game or speech. However, it is
challenging to find important people among all individuals in images directly
based on their spatial or appearance information due to the existence of
diverse variations of pose, action, appearance of persons and various changes
of occasions. We overcome this difficulty by constructing a multiple
Hyper-Interaction Graph to treat each individual in an image as a node and
inferring the most active node referring to interactions estimated by various
types of clews. We model pairwise interactions between persons as the edge
message communicated between nodes, resulting in a bidirectional
pairwise-interaction graph. To enrich the personperson interaction estimation,
we further introduce a unidirectional hyper-interaction graph that models the
consensus of interaction between a focal person and any person in a local
region around. Finally, we modify the PageRank algorithm to infer the
activeness of persons on the multiple Hybrid-Interaction Graph (HIG), the union
of the pairwise-interaction and hyperinteraction graphs, and we call our
algorithm the PersonRank. In order to provide publicable datasets for
evaluation, we have contributed a new dataset called Multi-scene Important
People Image Dataset and gathered a NCAA Basketball Image Dataset from sports
game sequences. We have demonstrated that the proposed PersonRank outperforms
related methods clearly and substantially.Comment: 8 pages, conferenc
Influence of Image Classification Accuracy on Saliency Map Estimation
Saliency map estimation in computer vision aims to estimate the locations
where people gaze in images. Since people tend to look at objects in images,
the parameters of the model pretrained on ImageNet for image classification are
useful for the saliency map estimation. However, there is no research on the
relationship between the image classification accuracy and the performance of
the saliency map estimation. In this paper, it is shown that there is a strong
correlation between image classification accuracy and saliency map estimation
accuracy. We also investigated the effective architecture based on multi scale
images and the upsampling layers to refine the saliency-map resolution. Our
model achieved the state-of-the-art accuracy on the PASCAL-S, OSIE, and MIT1003
datasets. In the MIT Saliency Benchmark, our model achieved the best
performance in some metrics and competitive results in the other metrics.Comment: CAAI Transactions on Intelligence Technology, accepted in 201
Predicting rhesus monkey eye movements during natural- image search
There are three prominent factors that can predict human visual-search behavior in natural scenes: the distinctiveness of a location (salience), similarity to the target (relevance), and features of the environment that predict where the object might be (context). We do not currently know how well these factors are able to predict macaque visual search, which matters because it is arguably the most popular model for asking how the brain controls eye movements. Here we trained monkeys to perform the pedestrian search task previously used for human subjects. Salience, relevance, and context models were all predictive of monkey eye fixations and jointly about as precise as for humans. We attempted to disrupt the influence of scene context on search by testing the monkeys with an inverted set of the same images. Surprisingly, the monkeys were able to locate the pedestrian at a rate similar to that for upright images. The best predictions of monkey fixations in searching inverted images were obtained by rotating the results of the model predictions for the original image. The fact that the same models can predict human and monkey search behavior suggests that the monkey can be used as a good model for understanding how the human brain enables natural-scene search
Knowledge-driven perceptual organization reshapes information sampling via eye movements
Humans constantly move their eyes to explore the environment. However, how image-computable features and object representations contribute to eye-movement control is an ongoing debate. Recent developments in object perception indicate a complex relationship between features and object representations, where image-independent object knowledge generates objecthood by reconfiguring how feature space is carved up. Here, we adopt this emerging perspective, asking whether object-oriented eye movements result from gaze being guided by image-computable features, or by the fact that these features are bound into an object representation. We recorded eye movements in response to stimuli that initially appear as meaningless patches but are experienced as coherent objects once relevant object knowledge has been acquired. We demonstrate that fixations on identical images are more object-centered, less dispersed, and more consistent across observers once these images are organized into objects. Gaze guidance also showed a shift from exploratory information sampling to exploitation of object-related image areas. These effects were evident from the first fixations onwards. Importantly, eye movements were not fully determined by knowledge-dependent object representations but were best explained by the integration of these representations with image-computable features. Overall, the results show how information sampling via eye movements is guided by a dynamic interaction between image-computable features and knowledge-driven perceptual organization
- …