Search CORE

64 research outputs found

Foveation scalable video coding with automatic fixation selection

Author: A.C. Bovik
Ligang Lu
Zhou Wang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Space-variant picture coding

Author: Popkin Timothy John
Publication venue: 'Queen Mary University of London'
Publication date: 01/01/2010
Field of study

PhDSpace-variant picture coding techniques exploit the strong spatial non-uniformity of the human visual system in order to increase coding efficiency in terms of perceived quality per bit. This thesis extends space-variant coding research in two directions. The first of these directions is in foveated coding. Past foveated coding research has been dominated by the single-viewer, gaze-contingent scenario. However, for research into the multi-viewer and probability-based scenarios, this thesis presents a missing piece: an algorithm for computing an additive multi-viewer sensitivity function based on an established eye resolution model, and, from this, a blur map that is optimal in the sense of discarding frequencies in least-noticeable- rst order. Furthermore, for the application of a blur map, a novel algorithm is presented for the efficient computation of high-accuracy smoothly space-variant Gaussian blurring, using a specialised filter bank which approximates perfect space-variant Gaussian blurring to arbitrarily high accuracy and at greatly reduced cost compared to the brute force approach of employing a separate low-pass filter at each image location. The second direction is that of artifi cially increasing the depth-of- field of an image, an idea borrowed from photography with the advantage of allowing an image to be reduced in bitrate while retaining or increasing overall aesthetic quality. Two synthetic depth of field algorithms are presented herein, with the desirable properties of aiming to mimic occlusion eff ects as occur in natural blurring, and of handling any number of blurring and occlusion levels with the same level of computational complexity. The merits of this coding approach have been investigated by subjective experiments to compare it with single-viewer foveated image coding. The results found the depth-based preblurring to generally be significantly preferable to the same level of foveation blurring

Queen Mary Research Online

OpenGrey Repository

Content-prioritised video coding for British Sign Language communication.

Author: Muir Laura Joy
Publication venue
Publication date: 31/10/2007
Field of study

Video communication of British Sign Language (BSL) is important for remote interpersonal communication and for the equal provision of services for deaf people. However, the use of video telephony and video conferencing applications for BSL communication is limited by inadequate video quality. BSL is a highly structured, linguistically complete, natural language system that expresses vocabulary and grammar visually and spatially using a complex combination of facial expressions (such as eyebrow movements, eye blinks and mouth/lip shapes), hand gestures, body movements and finger-spelling that change in space and time. Accurate natural BSL communication places specific demands on visual media applications which must compress video image data for efficient transmission. Current video compression schemes apply methods to reduce statistical redundancy and perceptual irrelevance in video image data based on a general model of Human Visual System (HVS) sensitivities. This thesis presents novel video image coding methods developed to achieve the conflicting requirements for high image quality and efficient coding. Novel methods of prioritising visually important video image content for optimised video coding are developed to exploit the HVS spatial and temporal response mechanisms of BSL users (determined by Eye Movement Tracking) and the characteristics of BSL video image content. The methods implement an accurate model of HVS foveation, applied in the spatial and temporal domains, at the pre-processing stage of a current standard-based system (H.264). Comparison of the performance of the developed and standard coding systems, using methods of video quality evaluation developed for this thesis, demonstrates improved perceived quality at low bit rates. BSL users, broadcasters and service providers benefit from the perception of high quality video over a range of available transmission bandwidths. The research community benefits from a new approach to video coding optimisation and better understanding of the communication needs of deaf people

Open Access Institutional Repository at Robert Gordon University

Foveated Streaming of Real-Time Graphics

Author: Illahi Gazi Karam
Kämäräinen Teemu Veli Erik
Siekkinen Matti
Ylä-Jääski Antti
Publication venue: ACM
Publication date: 15/07/2021
Field of study

Remote rendering systems comprise powerful servers that render graphics on behalf of low-end client devices and stream the graphics as compressed video, enabling high end gaming and Virtual Reality on those devices. One key challenge with them is the amount of bandwidth required for streaming high quality video. Humans have spatially non-uniform visual acuity: We have sharp central vision but our ability to discern details rapidly decreases with angular distance from the point of gaze. This phenomenon called foveation can be taken advantage of to reduce the need for bandwidth. In this paper, we study three different methods to produce a foveated video stream of real-time rendered graphics in a remote rendered system: 1) foveated shading as part of the rendering pipeline, 2) foveation as post processing step after rendering and before video encoding, 3) foveated video encoding. We report results from a number of experiments with these methods. They suggest that foveated rendering alone does not help save bandwidth. Instead, the two other methods decrease the resulting video bitrate significantly but they also have different quality per bit and latency profiles, which makes them desirable solutions in slightly different situations.Peer reviewe

Aaltodoc Publication Archive

Helsingin yliopiston digitaalinen arkisto

Recommended from our members

Foveated object recognition by corner search

Author: Arnow Thomas Louis, 1946-
Publication venue
Publication date: 01/05/2008
Field of study

textHere we describe a gray scale object recognition system based on foveated corner finding, the computation of sequential fixation points, and elements of Lowe’s SIFT transform. The system achieves rotational, transformational, and limited scale invariant object recognition that produces recognition decisions using data extracted from sequential fixation points. It is broken into two logical steps. The first is to develop principles of foveated visual search and automated fixation selection to accomplish corner search. The result is a new algorithm for finding corners which is also a corner-based algorithm for aiming computed foveated visual fixations. In the algorithm, long saccades move the fovea to previously unexplored areas of the image, while short saccades improve the accuracy of putative corner locations. The system is tested on two natural scenes. As an interesting comparison study we compare fixations generated by the algorithm with those of subjects viewing the same images, whose eye movements are being recorded by an eyetracker. The comparison of fixation patterns is made using an information-theoretic measure. Results show that the algorithm is a good locator of corners, but does not correlate particularly well with human visual fixations. The second step is to use the corners located, which meet certain goodness criteria, as keypoints in a modified version of the SIFT algorithm. Two scales are implemented. This implementation creates a database of SIFT features of known objects. To recognize an unknown object, a corner is located and a feature vector created. The feature vector is compared with those in the database of known objects. The process is continued for each corner in the unknown object until enough information has been accumulated to reach a decision. The system was tested on 78 gray scale objects, hand tools and airplanes, and shown to perform well.Electrical and Computer Engineerin

Texas ScholarWorks

Subjective quality evaluation of foveated video coding using audio-visual focus of attention

Author: De Simone Francesca
Ebrahimi Touradj
Lee Jong-Seok
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 31/08/2011
Field of study

This paper presents a foveated coding method using audio-visual focus of attention and its evaluation through extensive subjective experiments on both standard definition and high definition sequences. Regarding a sound-emitting region as the location drawing the human attention, the method applies varying quality levels in an image frame according to the distance of a pixel to the identified sound source. Two experiments are presented to prove the efficiency of the method. Experiment 1 examines the validity and effectiveness of the method in comparison to the constant quality coding for high quality conditions. In Experiment 2, the method is compared to the fixed bit rate coding for low quality conditions where coding artifacts are noticeable. The results demonstrate that the foveated coding method provides considerable coding gain without significant quality degradation, but uneven distributions of the coding artifacts (blockiness) by the method are often less preferred than the uniform distribution of the artifacts. Additional interesting findings are also discussed, such as content dependence of the performance of the method, the memory effect in multiple viewings, and the difference in the quality perception for frame size variations

Infoscience - École polytechnique fédérale de Lausanne

PIM: Video Coding using Perceptual Importance Maps

Author: Anderson Alexander G.
Bourdev Lubomir
Katti Sachin
Olshausen Bruno
Pergament Evgenya
Rippel Oren
Tandon Pulkit
Tatwawadi Kedar
Weissman Tsachy
Publication venue
Publication date: 09/04/2023
Field of study

Human perception is at the core of lossy video compression, with numerous approaches developed for perceptual quality assessment and improvement over the past two decades. In the determination of perceptual quality, different spatio-temporal regions of the video differ in their relative importance to the human viewer. However, since it is challenging to infer or even collect such fine-grained information, it is often not used during compression beyond low-level heuristics. We present a framework which facilitates research into fine-grained subjective importance in compressed videos, which we then utilize to improve the rate-distortion performance of an existing video codec (x264). The contributions of this work are threefold: (1) we introduce a web-tool which allows scalable collection of fine-grained perceptual importance, by having users interactively paint spatio-temporal maps over encoded videos; (2) we use this tool to collect a dataset with 178 videos with a total of 14443 frames of human annotated spatio-temporal importance maps over the videos; and (3) we use our curated dataset to train a lightweight machine learning model which can predict these spatio-temporal importance regions. We demonstrate via a subjective study that encoding the videos in our dataset while taking into account the importance maps leads to higher perceptual quality at the same bitrate, with the videos encoded with importance maps preferred

1.8 \times

over the baseline videos. Similarly, we show that for the 18 videos in test set, the importance maps predicted by our model lead to higher perceptual quality videos,

2 \times

preferred over the baseline at the same bitrate

arXiv.org e-Print Archive

A quadtree driven image fusion quality assessment

Author: Creighton Douglas
Hossny M.
Nahavandi Saeid
Publication venue: IEEE Xplore
Publication date: 01/01/2007
Field of study

In this paper a new method to compute saliency of source images is presented. This work is an extension to universal quality index founded by Wang and Bovik and improved by Piella. It defines the saliency according to the change of topology of quadratic tree decomposition between source images and the fused image. The saliency function provides higher weight for the tree nodes that differs more in the fused image in terms topology. Quadratic tree decomposition provides an easy and systematic way to add a saliency factor based on the segmented regions in the images. <br /

Deakin Research Online

On the Interplay of Foveated Rendering and Video Encoding

Author: Illahi Gazi Karam
Kämäräinen Teemu
Siekkinen Matti
Ylä-Jääski Antti
Publication venue: Association for Computing Machinery
Publication date: 01/11/2020
Field of study

Publisher Copyright: © 2020 Owner/Author.Humans have sharp central vision but low peripheral visual acuity. Prior work has taken advantage of this phenomenon in two ways: foveated rendering (FR) reduces the computational workload of rendering by producing lower visual quality for peripheral regions and foveated video encoding (FVE) reduces the bitrate of streamed video through heavier compression of peripheral regions. Remote rendering systems require both rendering and video encoding and the two techniques can be combined to reduce both computing and bandwidth consumption. We report early results from such a combination with remote VR rendering. The results highlight that FR causes large bitrate overhead when combined with normal video encoding but combining it with FVE can mitigate it.Peer reviewe

Crossref

Aaltodoc Publication Archive

Helsingin yliopiston digitaalinen arkisto