64 research outputs found
Image Extraction by Wide Angle Foveated Lens for Overt-Attention
This paper defines Wide Angle Foveated (WAF)
imaging. A proposed model combines Cartesian coordinate
system, a log-polar coordinate system, and a unique camera
model composed of planar projection and spherical projection
for all-purpose use of a single imaging device. The central field-of-view (FOV) and intermediate FOV are given translation-invariance
and, rotation and scale-invariance for pattern
recognition, respectively. Further, the peripheral FOV is more
useful for camera’s view direction control, because its image
height is linear to an incident angle to the camera model’s optical
center point. Thus, this imaging model improves its usability
especially when a camera is dynamically moved, that is, overt-attention.
Moreover, simulation results of image extraction show
advantages of the proposed model, in view of its magnification
factor of the central FOV, accuracy of scale-invariance and
flexibility to describe other WAF vision sensors
Wide-Angle Foveation for All-Purpose Use
This paper proposes a model of a wide-angle space-variant image that provides a guide for designing a fovea sensor. First, an advanced wide-angle foveated (AdWAF) model is formulated, taking all-purpose use into account. This proposed model uses both Cartesian (linear) coordinates and logarithmic coordinates in both planar projection and spherical projection. Thus, this model divides its wide-angle field of view into four areas, such that it can represent an image by various types of lenses, flexibly. The first simulation compares with other lens models, in terms of image height and resolution. The result shows that the AdWAF model can reduce image data by 13.5%, compared to a log-polar lens model, both having the same resolution in the central field of view. The AdWAF image is remapped from an actual input image by the prototype fovea lens, a wide-angle foveated (WAF) lens, using the proposed model. The second simulation compares with other foveation models used for the existing log-polar chip and vision system. The third simulation estimates a scale-invariant property by comparing with the existing fovea lens and the log-polar lens. The AdWAF model gives its planar logarithmic part a complete scale-invariant property, while the fovea lens has 7.6% error at most in its spherical logarithmic part. The fourth simulation computes optical flow in order to examine the unidirectional property when the fovea sensor by the AdWAF model moves, compared to the pinhole camera. The result obtained by using a concept of a virtual cylindrical screen indicates that the proposed model has advantages in terms of computation and application of the optical flow when the fovea sensor moves forward
Hierarchical Object-Based Visual Attention for Machine Vision
Institute of Perception, Action and BehaviourHuman vision uses mechanisms of covert attention to selectively process interesting
information and overt eye movements to extend this selectivity ability. Thus, visual
tasks can be effectively dealt with by limited processing resources. Modelling visual
attention for machine vision systems is not only critical but also challenging. In the
machine vision literature there have been many conventional attention models developed
but they are all space-based only and cannot perform object-based selection. In
consequence, they fail to work in real-world visual environments due to the intrinsic
limitations of the space-based attention theory upon which these models are built.
The aim of the work presented in this thesis is to provide a novel human-like visual
selection framework based on the object-based attention theory recently being developed
in psychophysics. The proposed solution – a Hierarchical Object-based Attention
Framework (HOAF) based on grouping competition, consists of two closely-coupled
visual selection models of (1) hierarchical object-based visual (covert) attention and
(2) object-based attention-driven (overt) saccadic eye movements. The Hierarchical
Object-based Attention Model (HOAM) is the primary selection mechanism and the
Object-based Attention-Driven Saccading model (OADS) has a supporting role, both
of which are combined in the integrated visual selection framework HOAF.
This thesis first describes the proposed object-based attention model HOAM which
is the primary component of the selection framework HOAF. The model is based on
recent psychophysical results on object-based visual attention and adopted grouping-based
competition to integrate object-based and space-based attention together so as
to achieve object-based hierarchical selectivity. The behaviour of the model is demonstrated
on a number of synthetic images simulating psychophysical experiments and
real-world natural scenes. The experimental results showed that the performance of
our object-based attention model HOAM concurs with the main findings in the psychophysical
literature on object-based and space-based visual attention. Moreover,
HOAM has outstanding hierarchical selectivity from far to near and from coarse to fine
by features, objects, spatial regions, and their groupings in complex natural scenes.
This successful performance arises from three original mechanisms in the model:
grouping-based saliency evaluation, integrated competition between groupings, and
hierarchical selectivity. The model is the first implemented machine vision model of
integrated object-based and space-based visual attention.
The thesis then addresses another proposed model of Object-based Attention-Driven
Saccadic eye movements (OADS) built upon the object-based attention model HOAM,
ii
as an overt saccading component within the object-based selection framework HOAF.
This model, like our object-based attention model HOAM, is also the first implemented
machine vision saccading model which makes a clear distinction between (covert) visual
attention and overt saccading movements in a two-level selection system – an
important feature of human vision but not yet explored in conventional machine vision
saccading systems. In the saccading model OADS, a log-polar retina-like sensor
is employed to simulate the human-like foveation imaging for space variant sensing.
Through a novel mechanism for attention-driven orienting, the sensor fixates on
new destinations determined by object-based attention. Hence it helps attention to
selectively process interesting objects located at the periphery of the whole field of
view to accomplish the large-scale visual selection tasks. By another proposed novel
mechanism for temporary inhibition of return, OADS can simulate the human saccading/
attention behaviour to refixate/reattend interesting objects for further detailed
inspection.
This thesis concludes that the proposed human-like visual selection solution –
HOAF, which is inspired by psychophysical object-based attention theory and grouping-based
competition, is particularly useful for machine vision. HOAF is a general and
effective visual selection framework integrating object-based attention and attentiondriven
saccadic eye movements with biological plausibility and object-based hierarchical
selectivity from coarse to fine in a space-time context
Perception-driven approaches to real-time remote immersive visualization
In remote immersive visualization systems, real-time 3D perception through RGB-D cameras, combined with modern Virtual Reality (VR) interfaces, enhances the user’s sense of presence in a remote scene through 3D reconstruction rendered in a remote immersive visualization system. Particularly, in situations when there is a need to visualize, explore and perform tasks in inaccessible environments, too hazardous or distant. However, a remote visualization system requires the entire pipeline from 3D data acquisition to VR rendering satisfies the speed, throughput, and high visual realism. Mainly when using point-cloud, there is a fundamental quality difference between the acquired data of the physical world and the displayed data because of network latency and throughput limitations that negatively impact the sense of presence and provoke cybersickness. This thesis presents state-of-the-art research to address these problems by taking the human visual system as inspiration, from sensor data acquisition to VR rendering. The human visual system does not have a uniform vision across the field of view; It has the sharpest visual acuity at the center of the field of view. The acuity falls off towards the periphery. The peripheral vision provides lower resolution to guide the eye movements so that the central vision visits all the interesting crucial parts. As a first contribution, the thesis developed remote visualization strategies that utilize the acuity fall-off to facilitate the processing, transmission, buffering, and rendering in VR of 3D reconstructed scenes while simultaneously reducing throughput requirements and latency. As a second contribution, the thesis looked into attentional mechanisms to select and draw user engagement to specific information from the dynamic spatio-temporal environment. It proposed a strategy to analyze the remote scene concerning the 3D structure of the scene, its layout, and the spatial, functional, and semantic relationships between objects in the scene. The strategy primarily focuses on analyzing the scene with models the human visual perception uses. It sets a more significant proportion of computational resources on objects of interest and creates a more realistic visualization. As a supplementary contribution, A new volumetric point-cloud density-based Peak Signal-to-Noise Ratio (PSNR) metric is proposed to evaluate the introduced techniques. An in-depth evaluation of the presented systems, comparative examination of the proposed point cloud metric, user studies, and experiments demonstrated that the methods introduced in this thesis are visually superior while significantly reducing latency and throughput
Space-variant picture coding
PhDSpace-variant picture coding techniques exploit the strong spatial non-uniformity of
the human visual system in order to increase coding efficiency in terms of perceived quality
per bit. This thesis extends space-variant coding research in two directions. The first of
these directions is in foveated coding. Past foveated coding research has been dominated
by the single-viewer, gaze-contingent scenario. However, for research into the multi-viewer
and probability-based scenarios, this thesis presents a missing piece: an algorithm for computing
an additive multi-viewer sensitivity function based on an established eye resolution
model, and, from this, a blur map that is optimal in the sense of discarding frequencies in
least-noticeable- rst order. Furthermore, for the application of a blur map, a novel algorithm
is presented for the efficient computation of high-accuracy smoothly space-variant
Gaussian blurring, using a specialised filter bank which approximates perfect space-variant
Gaussian blurring to arbitrarily high accuracy and at greatly reduced cost compared to
the brute force approach of employing a separate low-pass filter at each image location.
The second direction is that of artifi cially increasing the depth-of- field of an image, an
idea borrowed from photography with the advantage of allowing an image to be reduced
in bitrate while retaining or increasing overall aesthetic quality. Two synthetic depth of field algorithms are presented herein, with the desirable properties of aiming to mimic
occlusion eff ects as occur in natural blurring, and of handling any number of blurring
and occlusion levels with the same level of computational complexity. The merits of this
coding approach have been investigated by subjective experiments to compare it with
single-viewer foveated image coding. The results found the depth-based preblurring to
generally be significantly preferable to the same level of foveation blurring
Recommended from our members
Foveated Vision Models for Search and Recognition
Computer vision has made a significant progress in recent years thanks to advancement in neural network architectures and computing power. At the sensory level, the current machine vision systems sample the visual data uniformly to make predictions about the scene. This is in contrast with the human vision system that has high visual acuity only in a small central region, the fovea, and much coarser sampling away from the center. There has been a renewed interest, particularly in the context of active vision for robotics navigation and scene exploration, to develop biologically motivated methods that can leverage such foveated computations. While foveated vision offers computational savings at or near the region of interest, it requires eye movements to scan the scene for effective image understanding. The hypothesis is that methods that can leverage non-uniform sampling of the field of view together with eye-movements will lead to a new class of active vision systems that are optimized computationally for specific tasks of interest.Inspired by the above observations, this research provides, for the first time, a comprehensive study of the human visual search in the constrained setting of person identification in the wild. A novel video database is created that systematically tests how different parts of a person contribute towards eye-movements and person identification. Our study shows that the search errors can dominate the overall recognition accuracy in human subject experiments. This calls for new strategies for integrating eye tracking with foveated image representations. Towards this two specific approaches are investigated further.In the first approach, a deep neural network based method is developed to model eye movements. Using the long-short-term-memory to model the successive fixations. The proposed method outperforms state of the state of the art performance while simplifying the feature extraction procedure. The second approach focuses on the foveated image model that leverages multiple fixations. A convolutional neural network method is proposed that works directly with the foveated input images that achieves competitive recognition rates compared to standard neural networks operating on the same number of input pixels. Overall the thesis investigates the requirements and implementations that could support active foveated vision, and lays down the ground work for future studies in this area
Active Vision for Scene Understanding
Visual perception is one of the most important sources of information for both humans and robots. A particular challenge is the acquisition and interpretation of complex unstructured scenes. This work contributes to active vision for humanoid robots. A semantic model of the scene is created, which is extended by successively changing the robot\u27s view in order to explore interaction possibilities of the scene
Perceptive agents with attentive interfaces : learning and vision for man-machine systems
Thesis (Ph. D.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1996.Includes bibliographical references (leaves 107-116).by Trevor Jackson Darrell.Ph. D
- …