276 research outputs found
Hierarchical Object-Based Visual Attention for Machine Vision
Institute of Perception, Action and BehaviourHuman vision uses mechanisms of covert attention to selectively process interesting
information and overt eye movements to extend this selectivity ability. Thus, visual
tasks can be effectively dealt with by limited processing resources. Modelling visual
attention for machine vision systems is not only critical but also challenging. In the
machine vision literature there have been many conventional attention models developed
but they are all space-based only and cannot perform object-based selection. In
consequence, they fail to work in real-world visual environments due to the intrinsic
limitations of the space-based attention theory upon which these models are built.
The aim of the work presented in this thesis is to provide a novel human-like visual
selection framework based on the object-based attention theory recently being developed
in psychophysics. The proposed solution β a Hierarchical Object-based Attention
Framework (HOAF) based on grouping competition, consists of two closely-coupled
visual selection models of (1) hierarchical object-based visual (covert) attention and
(2) object-based attention-driven (overt) saccadic eye movements. The Hierarchical
Object-based Attention Model (HOAM) is the primary selection mechanism and the
Object-based Attention-Driven Saccading model (OADS) has a supporting role, both
of which are combined in the integrated visual selection framework HOAF.
This thesis first describes the proposed object-based attention model HOAM which
is the primary component of the selection framework HOAF. The model is based on
recent psychophysical results on object-based visual attention and adopted grouping-based
competition to integrate object-based and space-based attention together so as
to achieve object-based hierarchical selectivity. The behaviour of the model is demonstrated
on a number of synthetic images simulating psychophysical experiments and
real-world natural scenes. The experimental results showed that the performance of
our object-based attention model HOAM concurs with the main findings in the psychophysical
literature on object-based and space-based visual attention. Moreover,
HOAM has outstanding hierarchical selectivity from far to near and from coarse to fine
by features, objects, spatial regions, and their groupings in complex natural scenes.
This successful performance arises from three original mechanisms in the model:
grouping-based saliency evaluation, integrated competition between groupings, and
hierarchical selectivity. The model is the first implemented machine vision model of
integrated object-based and space-based visual attention.
The thesis then addresses another proposed model of Object-based Attention-Driven
Saccadic eye movements (OADS) built upon the object-based attention model HOAM,
ii
as an overt saccading component within the object-based selection framework HOAF.
This model, like our object-based attention model HOAM, is also the first implemented
machine vision saccading model which makes a clear distinction between (covert) visual
attention and overt saccading movements in a two-level selection system β an
important feature of human vision but not yet explored in conventional machine vision
saccading systems. In the saccading model OADS, a log-polar retina-like sensor
is employed to simulate the human-like foveation imaging for space variant sensing.
Through a novel mechanism for attention-driven orienting, the sensor fixates on
new destinations determined by object-based attention. Hence it helps attention to
selectively process interesting objects located at the periphery of the whole field of
view to accomplish the large-scale visual selection tasks. By another proposed novel
mechanism for temporary inhibition of return, OADS can simulate the human saccading/
attention behaviour to refixate/reattend interesting objects for further detailed
inspection.
This thesis concludes that the proposed human-like visual selection solution β
HOAF, which is inspired by psychophysical object-based attention theory and grouping-based
competition, is particularly useful for machine vision. HOAF is a general and
effective visual selection framework integrating object-based attention and attentiondriven
saccadic eye movements with biological plausibility and object-based hierarchical
selectivity from coarse to fine in a space-time context
Visual location awareness for mobile robots using feature-based vision
Department Head: L. Darrell Whitley.2010 Spring.Includes bibliographical references (pages 48-50).This thesis presents an evaluation of feature-based visual recognition paradigm for the task of mobile robot localization. Although many works describe feature-based visual robot localization, they often do so using complex methods for map-building and position estimation which obscure the underlying vision systems' performance. One of the main contributions of this work is the development of an evaluation algorithm employing simple models for location awareness with focus on evaluating the underlying vision system. While SeeAsYou is used as a prototypical vision system for evaluation, the algorithm is designed to allow it to be used with other feature-based vision systems as well. The main result is that feature-based recognition with SeeAsYou provides some information but is not strong enough to reliably achieve location awareness without the temporal context. Adding a simple temporal model, however, suggests a more reliable localization performance
Temporal unpredictability detection of real-time video sequence
Imperial Users onl
Modeling Bottom-Up and Top-Down Attention with a Neurodynamic Model of V1
Previous studies in that line suggested that lateral interactions of V1 cells
are responsible, among other visual effects, of bottom-up visual attention
(alternatively named visual salience or saliency). Our objective is to mimic
these connections in the visual system with a neurodynamic network of
firing-rate neurons. Early subcortical processes (i.e. retinal and thalamic)
are functionally simulated. An implementation of the cortical magnification
function is included to define the retinotopical projections towards V1,
processing neuronal activity for each distinct view during scene observation.
Novel computational definitions of top-down inhibition (in terms of inhibition
of return and selection mechanisms), are also proposed to predict attention in
Free-Viewing and Visual Search conditions. Results show that our model
outpeforms other biologically-inpired models of saliency prediction as well as
to predict visual saccade sequences during free viewing. We also show how
temporal and spatial characteristics of inhibition of return can improve
prediction of saccades, as well as how distinct search strategies (in terms of
feature-selective or category-specific inhibition) predict attention at
distinct image contexts.Comment: 32 pages, 19 figure
Toward a more biologically plausible model of object recognition
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Physics, 2007.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (leaves 105-113).Rapidly and reliably recognizing an object (is that a cat or a tiger?) is obviously an important skill for survival. However, it is a difficult computational problem, because the same object may appear differently under various conditions, while different objects may share similar features. A robust recognition system must have a capacity to distinguish between similar-looking objects, while being invariant to the appearance-altering transformation of an object. The fundamental challenge for any recognition system lies within this simultaneous requirement for both specificity and invariance. An emerging picture from decades of neuroscience research is that the cortex overcomes this challenge by gradually building up specificity and invariance with a hierarchical architecture. In this thesis, I present a computational model of object recognition with a feedforward and hierarchical architecture. The model quantitatively describes the anatomy, physiology, and the first few hundred milliseconds of visual information processing in the ventral pathway of the primate visual cortex. There are three major contributions. First, the two main operations in the model (Gaussian and maximum) have been cast into a more biologically plausible form, using monotonic nonlinearities and divisive normalization, and a possible canonical neural circuitry has been proposed. Second, shape tuning properties of visual area V4 have been explored using the corresponding layers in the model. It is demonstrated that the observed V4 selectivity for the shapes of intermediate complexity (gratings and contour features) can be explained by the combinations of orientation-selective inputs. Third, shape tuning properties in the higher visual area, inferior temporal (IT) cortex, have also been explored. It is demonstrated that the selectivity and invariance properties of IT neurons can be generated by the feedforward and hierarchical combinations of Gaussian-like and max-like operations, and their responses can support robust object recognition. Furthermore, experimentally-observed clutter effects and trade-off between selectivity and invariance in IT can also be observed and understood in this computational framework.(cont.) These studies show that the model is in good agreements with a number of physiological data and provides insights, at multiple levels, for understanding object recognition process in the cortex.by Minjoon Kouh.Ph.D
Detecting Biological Motion for Human-Robot Interaction: A Link between Perception and Action
One of the fundamental skills supporting safe and comfortable interaction between humans is their capability to understand intuitively each other's actions and intentions. At the basis of this ability is a special-purpose visual processing that human brain has developed to comprehend human motion. Among the first "building blocks" enabling the bootstrapping of such visual processing is the ability to detect movements performed by biological agents in the scene, a skill mastered by human babies in the first days of their life. In this paper, we present a computational model based on the assumption that such visual ability must be based on local low-level visual motion features, which are independent of shape, such as the configuration of the body and perspective. Moreover, we implement it on the humanoid robot iCub, embedding it into a software architecture that leverages the regularities of biological motion also to control robot attention and oculomotor behaviors. In essence, we put forth a model in which the regularities of biological motion link perception and action enabling a robotic agent to follow a human-inspired sensory-motor behavior. We posit that this choice facilitates mutual understanding and goal prediction during collaboration, increasing the pleasantness and safety of the interactio
- β¦