77,897 research outputs found
Hierarchical Object-Based Visual Attention for Machine Vision
Institute of Perception, Action and BehaviourHuman vision uses mechanisms of covert attention to selectively process interesting
information and overt eye movements to extend this selectivity ability. Thus, visual
tasks can be effectively dealt with by limited processing resources. Modelling visual
attention for machine vision systems is not only critical but also challenging. In the
machine vision literature there have been many conventional attention models developed
but they are all space-based only and cannot perform object-based selection. In
consequence, they fail to work in real-world visual environments due to the intrinsic
limitations of the space-based attention theory upon which these models are built.
The aim of the work presented in this thesis is to provide a novel human-like visual
selection framework based on the object-based attention theory recently being developed
in psychophysics. The proposed solution – a Hierarchical Object-based Attention
Framework (HOAF) based on grouping competition, consists of two closely-coupled
visual selection models of (1) hierarchical object-based visual (covert) attention and
(2) object-based attention-driven (overt) saccadic eye movements. The Hierarchical
Object-based Attention Model (HOAM) is the primary selection mechanism and the
Object-based Attention-Driven Saccading model (OADS) has a supporting role, both
of which are combined in the integrated visual selection framework HOAF.
This thesis first describes the proposed object-based attention model HOAM which
is the primary component of the selection framework HOAF. The model is based on
recent psychophysical results on object-based visual attention and adopted grouping-based
competition to integrate object-based and space-based attention together so as
to achieve object-based hierarchical selectivity. The behaviour of the model is demonstrated
on a number of synthetic images simulating psychophysical experiments and
real-world natural scenes. The experimental results showed that the performance of
our object-based attention model HOAM concurs with the main findings in the psychophysical
literature on object-based and space-based visual attention. Moreover,
HOAM has outstanding hierarchical selectivity from far to near and from coarse to fine
by features, objects, spatial regions, and their groupings in complex natural scenes.
This successful performance arises from three original mechanisms in the model:
grouping-based saliency evaluation, integrated competition between groupings, and
hierarchical selectivity. The model is the first implemented machine vision model of
integrated object-based and space-based visual attention.
The thesis then addresses another proposed model of Object-based Attention-Driven
Saccadic eye movements (OADS) built upon the object-based attention model HOAM,
ii
as an overt saccading component within the object-based selection framework HOAF.
This model, like our object-based attention model HOAM, is also the first implemented
machine vision saccading model which makes a clear distinction between (covert) visual
attention and overt saccading movements in a two-level selection system – an
important feature of human vision but not yet explored in conventional machine vision
saccading systems. In the saccading model OADS, a log-polar retina-like sensor
is employed to simulate the human-like foveation imaging for space variant sensing.
Through a novel mechanism for attention-driven orienting, the sensor fixates on
new destinations determined by object-based attention. Hence it helps attention to
selectively process interesting objects located at the periphery of the whole field of
view to accomplish the large-scale visual selection tasks. By another proposed novel
mechanism for temporary inhibition of return, OADS can simulate the human saccading/
attention behaviour to refixate/reattend interesting objects for further detailed
inspection.
This thesis concludes that the proposed human-like visual selection solution –
HOAF, which is inspired by psychophysical object-based attention theory and grouping-based
competition, is particularly useful for machine vision. HOAF is a general and
effective visual selection framework integrating object-based attention and attentiondriven
saccadic eye movements with biological plausibility and object-based hierarchical
selectivity from coarse to fine in a space-time context
The relation of phase noise and luminance contrast to overt attention in complex visual stimuli
Models of attention are typically based on difference maps in low-level features but neglect higher order stimulus structure. To what extent does higher order statistics affect human attention in natural stimuli? We recorded eye movements while observers viewed unmodified and modified images of natural scenes. Modifications included contrast modulations (resulting in changes to first- and second-order statistics), as well as the addition of noise to the Fourier phase (resulting in changes to higher order statistics). We have the following findings: (1) Subjects' interpretation of a stimulus as a “natural” depiction of an outdoor scene depends on higher order statistics in a highly nonlinear, categorical fashion. (2) Confirming previous findings, contrast is elevated at fixated locations for a variety of stimulus categories. In addition, we find that the size of this elevation depends on higher order statistics and reduces with increasing phase noise. (3) Global modulations of contrast bias eye position toward high contrasts, consistent with a linear effect of contrast on fixation probability. This bias is independent of phase noise. (4) Small patches of locally decreased contrast repel eye position less than large patches of the same aggregate area, irrespective of phase noise. Our findings provide evidence that deviations from surrounding statistics, rather than contrast per se, underlie the well-established relation of contrast to fixation
A computer vision model for visual-object-based attention and eye movements
This is the post-print version of the final paper published in Computer Vision and Image Understanding. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2008 Elsevier B.V.This paper presents a new computational framework for modelling visual-object-based attention and attention-driven eye movements within an integrated system in a biologically inspired approach. Attention operates at multiple levels of visual selection by space, feature, object and group depending on the nature of targets and visual tasks. Attentional shifts and gaze shifts are constructed upon their common process circuits and control mechanisms but also separated from their different function roles, working together to fulfil flexible visual selection tasks in complicated visual environments. The framework integrates the important aspects of human visual attention and eye movements resulting in sophisticated performance in complicated natural scenes. The proposed approach aims at exploring a useful visual selection system for computer vision, especially for usage in cluttered natural visual environments.National Natural Science of Founda-
tion of Chin
Task-demands can immediately reverse the effects of sensory-driven saliency in complex visual stimuli
In natural vision both stimulus features and task-demands affect an observer's attention. However, the relationship between sensory-driven (“bottom-up”) and task-dependent (“top-down”) factors remains controversial: Can task-demands counteract strong sensory signals fully, quickly, and irrespective of bottom-up features? To measure attention under naturalistic conditions, we recorded eye-movements in human observers, while they viewed photographs of outdoor scenes. In the first experiment, smooth modulations of contrast biased the stimuli's sensory-driven saliency towards one side. In free-viewing, observers' eye-positions were immediately biased toward the high-contrast, i.e., high-saliency, side. However, this sensory-driven bias disappeared entirely when observers searched for a bull's-eye target embedded with equal probability to either side of the stimulus. When the target always occurred in the low-contrast side, observers' eye-positions were immediately biased towards this low-saliency side, i.e., the sensory-driven bias reversed. Hence, task-demands do not only override sensory-driven saliency but also actively countermand it. In a second experiment, a 5-Hz flicker replaced the contrast gradient. Whereas the bias was less persistent in free viewing, the overriding and reversal took longer to deploy. Hence, insufficient sensory-driven saliency cannot account for the bias reversal. In a third experiment, subjects searched for a spot of locally increased contrast (“oddity”) instead of the bull's-eye (“template”). In contrast to the other conditions, a slight sensory-driven free-viewing bias prevails in this condition. In a fourth experiment, we demonstrate that at known locations template targets are detected faster than oddity targets, suggesting that the former induce a stronger top-down drive when used as search targets. Taken together, task-demands can override sensory-driven saliency in complex visual stimuli almost immediately, and the extent of overriding depends on the search target and the overridden feature, but not on the latter's free-viewing saliency
- …