75 research outputs found
Wide-Angle Foveation for All-Purpose Use
This paper proposes a model of a wide-angle space-variant image that provides a guide for designing a fovea sensor. First, an advanced wide-angle foveated (AdWAF) model is formulated, taking all-purpose use into account. This proposed model uses both Cartesian (linear) coordinates and logarithmic coordinates in both planar projection and spherical projection. Thus, this model divides its wide-angle field of view into four areas, such that it can represent an image by various types of lenses, flexibly. The first simulation compares with other lens models, in terms of image height and resolution. The result shows that the AdWAF model can reduce image data by 13.5%, compared to a log-polar lens model, both having the same resolution in the central field of view. The AdWAF image is remapped from an actual input image by the prototype fovea lens, a wide-angle foveated (WAF) lens, using the proposed model. The second simulation compares with other foveation models used for the existing log-polar chip and vision system. The third simulation estimates a scale-invariant property by comparing with the existing fovea lens and the log-polar lens. The AdWAF model gives its planar logarithmic part a complete scale-invariant property, while the fovea lens has 7.6% error at most in its spherical logarithmic part. The fourth simulation computes optical flow in order to examine the unidirectional property when the fovea sensor by the AdWAF model moves, compared to the pinhole camera. The result obtained by using a concept of a virtual cylindrical screen indicates that the proposed model has advantages in terms of computation and application of the optical flow when the fovea sensor moves forward
A distributed camera system for multi-resolution surveillance
We describe an architecture for a multi-camera, multi-resolution surveillance system. The aim is to support a set of distributed static and pan-tilt-zoom (PTZ) cameras and visual tracking algorithms, together with a central supervisor unit. Each camera (and possibly pan-tilt device) has a dedicated process and processor.
Asynchronous interprocess communications and archiving of data are achieved in a simple and effective way via a central repository, implemented using an SQL database.
Visual tracking data from static views are stored dynamically into tables in the database via client calls to the SQL server. A supervisor process running on the SQL server determines if active zoom cameras should be dispatched to observe a particular target, and this message is effected via writing demands into another database table.
We show results from a real implementation of the system comprising one static camera overviewing the environment under consideration and a PTZ camera operating
under closed-loop velocity control, which uses a fast and robust level-set-based region tracker. Experiments demonstrate the effectiveness of our approach and its feasibility to multi-camera systems for intelligent surveillance
Content-prioritised video coding for British Sign Language communication.
Video communication of British Sign Language (BSL) is important for remote interpersonal communication and for the equal provision of services for deaf people. However, the use of video telephony and video conferencing applications for BSL communication is limited by inadequate video quality. BSL is a highly structured, linguistically complete, natural language system that expresses vocabulary and grammar visually and spatially using a complex combination of facial expressions (such as eyebrow movements, eye blinks and mouth/lip shapes), hand gestures, body movements and finger-spelling that change in space and time. Accurate natural BSL communication places specific demands on visual media applications which must compress video image data for efficient transmission. Current video compression schemes apply methods to reduce statistical redundancy and perceptual irrelevance in video image data based on a general model of Human Visual System (HVS) sensitivities. This thesis presents novel video image coding methods developed to achieve the conflicting requirements for high image quality and efficient coding. Novel methods of prioritising visually important video image content for optimised video coding are developed to exploit the HVS spatial and temporal response mechanisms of BSL users (determined by Eye Movement Tracking) and the characteristics of BSL video image content. The methods implement an accurate model of HVS foveation, applied in the spatial and temporal domains, at the pre-processing stage of a current standard-based system (H.264). Comparison of the performance of the developed and standard coding systems, using methods of video quality evaluation developed for this thesis, demonstrates improved perceived quality at low bit rates. BSL users, broadcasters and service providers benefit from the perception of high quality video over a range of available transmission bandwidths. The research community benefits from a new approach to video coding optimisation and better understanding of the communication needs of deaf people
Space-variant picture coding
PhDSpace-variant picture coding techniques exploit the strong spatial non-uniformity of
the human visual system in order to increase coding efficiency in terms of perceived quality
per bit. This thesis extends space-variant coding research in two directions. The first of
these directions is in foveated coding. Past foveated coding research has been dominated
by the single-viewer, gaze-contingent scenario. However, for research into the multi-viewer
and probability-based scenarios, this thesis presents a missing piece: an algorithm for computing
an additive multi-viewer sensitivity function based on an established eye resolution
model, and, from this, a blur map that is optimal in the sense of discarding frequencies in
least-noticeable- rst order. Furthermore, for the application of a blur map, a novel algorithm
is presented for the efficient computation of high-accuracy smoothly space-variant
Gaussian blurring, using a specialised filter bank which approximates perfect space-variant
Gaussian blurring to arbitrarily high accuracy and at greatly reduced cost compared to
the brute force approach of employing a separate low-pass filter at each image location.
The second direction is that of artifi cially increasing the depth-of- field of an image, an
idea borrowed from photography with the advantage of allowing an image to be reduced
in bitrate while retaining or increasing overall aesthetic quality. Two synthetic depth of field algorithms are presented herein, with the desirable properties of aiming to mimic
occlusion eff ects as occur in natural blurring, and of handling any number of blurring
and occlusion levels with the same level of computational complexity. The merits of this
coding approach have been investigated by subjective experiments to compare it with
single-viewer foveated image coding. The results found the depth-based preblurring to
generally be significantly preferable to the same level of foveation blurring
A perceptual comparison of empirical and predictive region-of-interest video
When viewing multimedia presentations, a user only
attends to a relatively small part of the video display at any one point in time. By shifting allocation of bandwidth from peripheral areas to those locations where a user’s gaze is more likely to rest, attentive displays can be produced. Attentive displays aim to reduce resource requirements while minimizing negative user perception—understood in this paper as not only a user’s ability to assimilate and understand information but also his/her subjective satisfaction with the video content. This paper introduces and discusses a perceptual comparison between two region-of-interest display (RoID) adaptation techniques. A RoID is an attentive display where bandwidth has been preallocated around measured or highly probable areas of user gaze. In this paper, video content was manipulated using two sources of data: empirical measured data (captured using eye-tracking technology) and predictive data (calculated from the physical characteristics of the video data). Results show that display adaptation causes significant variation in users’ understanding of specific multimedia content. Interestingly, RoID adaptation and the type of video being presented both affect user perception of video quality. Moreover, the use of frame rates less than 15 frames per second, for any video adaptation technique, caused a significant reduction in user perceived quality, suggesting that although users are aware of video quality reduction, it does impact level of information assimilation and understanding. Results also highlight that user level of enjoyment is significantly affected by the type of video yet is not as affected by the quality or type of video adaptation—an interesting implication in the field of entertainment
Recommended from our members
Peripheral Representations: from Perception to Visual Search
The human visual field is composed of a high acuity region at the center of gaze called the fovea, andits complement, the visual periphery. Not much is known about the computations and representationsof the visual periphery, as most of the focus in the field of human (and machine) vision is geared towardsfoveal vision. Thus, the focus of this thesis will be on understanding the computations performed bythe human visual system in the visual periphery. In doing so, I will begin by modelling the perceptionof clutter and how it changes as a function of the behavioural task and point of fixation, developinga collection of foveated clutter models that enhance non-foveated models. I will then propose a newmetamer model that renders how the information is distorted in the visual field, and what this tells usabout the computations done in the visual periphery. Finally, I will conclude with the design of twohybrid man-machine collaborative visual search systems that try to overcome the limitations in humanvisual search imposed by the visual periphery and observer inefficiencies in terminating exploration
- …