230 research outputs found
Real-Time Restoration of Images Degraded by Uniform Motion Blur in Foveal Active Vision Systems
Foveated, log-polar, or space-variant image architectures provide a high resolution and wide field workspace, while providing a small pixel computation load. These characteristics are ideal for mobile robotic and active vision applications. Recently we have described a generalization of the Fourier Transform (the fast exponential chirp transform) which allows frame-rate computation of full-field 2D frequency transforms on a log-polar image format. In the present work, we use Wiener filtering, performed using the Exponential Chirp Transform, on log-polar (fovcated) image formats to de-blur images which have been degraded by uniform camera motion.Defense Advanced Research Projects Agency and Office of Naval Research (N00014-96-C-0178); Office of Naval Research Multidisciplinary University Research Initiative (N00014-95-1-0409
Cross-Resolution Flow Propagation for Foveated Video Super-Resolution
The demand of high-resolution video contents has grown over the years.
However, the delivery of high-resolution video is constrained by either
computational resources required for rendering or network bandwidth for remote
transmission. To remedy this limitation, we leverage the eye trackers found
alongside existing augmented and virtual reality headsets. We propose the
application of video super-resolution (VSR) technique to fuse low-resolution
context with regional high-resolution context for resource-constrained
consumption of high-resolution content without perceivable drop in quality. Eye
trackers provide us the gaze direction of a user, aiding us in the extraction
of the regional high-resolution context. As only pixels that falls within the
gaze region can be resolved by the human eye, a large amount of the delivered
content is redundant as we can't perceive the difference in quality of the
region beyond the observed region. To generate a visually pleasing frame from
the fusion of high-resolution region and low-resolution region, we study the
capability of a deep neural network of transferring the context of the observed
region to other regions (low-resolution) of the current and future frames. We
label this task a Foveated Video Super-Resolution (FVSR), as we need to
super-resolve the low-resolution regions of current and future frames through
the fusion of pixels from the gaze region. We propose Cross-Resolution Flow
Propagation (CRFP) for FVSR. We train and evaluate CRFP on REDS dataset on the
task of 8x FVSR, i.e. a combination of 8x VSR and the fusion of foveated
region. Departing from the conventional evaluation of per frame quality using
SSIM or PSNR, we propose the evaluation of past foveated region, measuring the
capability of a model to leverage the noise present in eye trackers during
FVSR. Code is made available at https://github.com/eugenelet/CRFP.Comment: 12 pages, 8 figures, to appear in IEEE/CVF Winter Conference on
Applications of Computer Vision (WACV) 202
Content-prioritised video coding for British Sign Language communication.
Video communication of British Sign Language (BSL) is important for remote interpersonal communication and for the equal provision of services for deaf people. However, the use of video telephony and video conferencing applications for BSL communication is limited by inadequate video quality. BSL is a highly structured, linguistically complete, natural language system that expresses vocabulary and grammar visually and spatially using a complex combination of facial expressions (such as eyebrow movements, eye blinks and mouth/lip shapes), hand gestures, body movements and finger-spelling that change in space and time. Accurate natural BSL communication places specific demands on visual media applications which must compress video image data for efficient transmission. Current video compression schemes apply methods to reduce statistical redundancy and perceptual irrelevance in video image data based on a general model of Human Visual System (HVS) sensitivities. This thesis presents novel video image coding methods developed to achieve the conflicting requirements for high image quality and efficient coding. Novel methods of prioritising visually important video image content for optimised video coding are developed to exploit the HVS spatial and temporal response mechanisms of BSL users (determined by Eye Movement Tracking) and the characteristics of BSL video image content. The methods implement an accurate model of HVS foveation, applied in the spatial and temporal domains, at the pre-processing stage of a current standard-based system (H.264). Comparison of the performance of the developed and standard coding systems, using methods of video quality evaluation developed for this thesis, demonstrates improved perceived quality at low bit rates. BSL users, broadcasters and service providers benefit from the perception of high quality video over a range of available transmission bandwidths. The research community benefits from a new approach to video coding optimisation and better understanding of the communication needs of deaf people
Noise-based Enhancement for Foveated Rendering
Human visual sensitivity to spatial details declines towards the periphery. Novel image synthesis techniques, so-called foveated rendering, exploit this observation and reduce the spatial resolution of synthesized images for the periphery, avoiding the synthesis of high-spatial-frequency details that are costly to generate but not perceived by a viewer. However, contemporary techniques do not make a clear distinction between the range of spatial frequencies that must be reproduced and those that can be omitted. For a given eccentricity, there is a range of frequencies that are detectable but not resolvable. While the accurate reproduction of these frequencies is not required, an observer can detect their absence if completely omitted. We use this observation to improve the performance of existing foveated rendering techniques. We demonstrate that this specific range of frequencies can be efficiently replaced with procedural noise whose parameters are carefully tuned to image content and human perception. Consequently, these fre- quencies do not have to be synthesized during rendering, allowing more aggressive foveation, and they can be replaced by noise generated in a less expensive post-processing step, leading to improved performance of the ren- dering system. Our main contribution is a perceptually-inspired technique for deriving the parameters of the noise required for the enhancement and its calibration. The method operates on rendering output and runs at rates exceeding 200 FPS at 4K resolution, making it suitable for integration with real-time foveated rendering systems for VR and AR devices. We validate our results and compare them to the existing contrast enhancement technique in user experiments
Foveated Encoding for Large High-Resolution Displays
Collaborative exploration of scientific data sets across large high-resolution displays requires both high visual detail as well as low-latency transfer of image data (oftentimes inducing the need to trade one for the other). In this work, we present a system that dynamically adapts the encoding quality in such systems in a way that reduces the required bandwidth without impacting the details perceived by one or more observers. Humans perceive sharp, colourful details, in the small foveal region around the centre of the field of view, while information in the periphery is perceived blurred and colourless. We account for this by tracking the gaze of observers, and respectively adapting the quality parameter of each macroblock used by the H.264 encoder, considering the so-called visual acuity fall-off. This allows to substantially reduce the required bandwidth with barely noticeable changes in visual quality, which is crucial for collaborative analysis across display walls at different locations. We demonstrate the reduced overall required bandwidth and the high quality inside the foveated regions using particle rendering and parallel coordinates
- …