179 research outputs found

    Perceptually optimised sign language video coding

    Get PDF

    Off-line Foveated Compression and Scene Perception: An Eye-Tracking Approach

    Get PDF
    With the continued growth of digital services offering storage and communication of pictorial information, the need to efficiently represent this information has become increasingly important, both from an information theoretic and a perceptual point of view. There has been a recent interest to design systems for efficient representation and compression of image and video data that take the features of the human visual system into account. One part of this thesis investigates whether knowledge about viewers' gaze positions as measured by an eye-tracker can be used to improve compression efficiency of digital video; regions not directly looked at by a number of previewers are lowpass filtered. This type of video manipulation is called off-line foveation. The amount of compression due to off-line foveation is assessed along with how it affects new viewers' gazing behavior as well as subjective quality. We found additional bitrate savings up to 50% (average 20%) due to off-line foveation prior to compression, without decreasing the subjective quality. In off-line foveation, it would be of great benefit to algorithmically predict where viewers look without having to perform eye-tracking measurements. In the first part of this thesis, new experimental paradigms combined with eye-tracking are used to understand the mechanisms behind gaze control during scene perception, thus investigating the prerequisites for such algorithms. Eye-movements are recorded from observers viewing contrast manipulated images depicting natural scenes under a neutral task. We report that image semantics, rather than the physical image content itself, largely dictates where people choose to look. Together with recent work on gaze prediction in video, the results in this thesis give only moderate support for successful applicability of algorithmic gaze prediction for off-line foveated video compression

    Content-prioritised video coding for British Sign Language communication.

    Get PDF
    Video communication of British Sign Language (BSL) is important for remote interpersonal communication and for the equal provision of services for deaf people. However, the use of video telephony and video conferencing applications for BSL communication is limited by inadequate video quality. BSL is a highly structured, linguistically complete, natural language system that expresses vocabulary and grammar visually and spatially using a complex combination of facial expressions (such as eyebrow movements, eye blinks and mouth/lip shapes), hand gestures, body movements and finger-spelling that change in space and time. Accurate natural BSL communication places specific demands on visual media applications which must compress video image data for efficient transmission. Current video compression schemes apply methods to reduce statistical redundancy and perceptual irrelevance in video image data based on a general model of Human Visual System (HVS) sensitivities. This thesis presents novel video image coding methods developed to achieve the conflicting requirements for high image quality and efficient coding. Novel methods of prioritising visually important video image content for optimised video coding are developed to exploit the HVS spatial and temporal response mechanisms of BSL users (determined by Eye Movement Tracking) and the characteristics of BSL video image content. The methods implement an accurate model of HVS foveation, applied in the spatial and temporal domains, at the pre-processing stage of a current standard-based system (H.264). Comparison of the performance of the developed and standard coding systems, using methods of video quality evaluation developed for this thesis, demonstrates improved perceived quality at low bit rates. BSL users, broadcasters and service providers benefit from the perception of high quality video over a range of available transmission bandwidths. The research community benefits from a new approach to video coding optimisation and better understanding of the communication needs of deaf people

    Foveated Encoding for Large High-Resolution Displays

    Get PDF
    Collaborative exploration of scientific data sets across large high-resolution displays requires both high visual detail as well as low-latency transfer of image data (oftentimes inducing the need to trade one for the other). In this work, we present a system that dynamically adapts the encoding quality in such systems in a way that reduces the required bandwidth without impacting the details perceived by one or more observers. Humans perceive sharp, colourful details, in the small foveal region around the centre of the field of view, while information in the periphery is perceived blurred and colourless. We account for this by tracking the gaze of observers, and respectively adapting the quality parameter of each macroblock used by the H.264 encoder, considering the so-called visual acuity fall-off. This allows to substantially reduce the required bandwidth with barely noticeable changes in visual quality, which is crucial for collaborative analysis across display walls at different locations. We demonstrate the reduced overall required bandwidth and the high quality inside the foveated regions using particle rendering and parallel coordinates

    Noise-based Enhancement for Foveated Rendering

    Get PDF
    Human visual sensitivity to spatial details declines towards the periphery. Novel image synthesis techniques, so-called foveated rendering, exploit this observation and reduce the spatial resolution of synthesized images for the periphery, avoiding the synthesis of high-spatial-frequency details that are costly to generate but not perceived by a viewer. However, contemporary techniques do not make a clear distinction between the range of spatial frequencies that must be reproduced and those that can be omitted. For a given eccentricity, there is a range of frequencies that are detectable but not resolvable. While the accurate reproduction of these frequencies is not required, an observer can detect their absence if completely omitted. We use this observation to improve the performance of existing foveated rendering techniques. We demonstrate that this specific range of frequencies can be efficiently replaced with procedural noise whose parameters are carefully tuned to image content and human perception. Consequently, these fre- quencies do not have to be synthesized during rendering, allowing more aggressive foveation, and they can be replaced by noise generated in a less expensive post-processing step, leading to improved performance of the ren- dering system. Our main contribution is a perceptually-inspired technique for deriving the parameters of the noise required for the enhancement and its calibration. The method operates on rendering output and runs at rates exceeding 200 FPS at 4K resolution, making it suitable for integration with real-time foveated rendering systems for VR and AR devices. We validate our results and compare them to the existing contrast enhancement technique in user experiments

    Foveation scalable video coding with automatic fixation selection

    Full text link

    Perception-driven approaches to real-time remote immersive visualization

    Get PDF
    In remote immersive visualization systems, real-time 3D perception through RGB-D cameras, combined with modern Virtual Reality (VR) interfaces, enhances the user’s sense of presence in a remote scene through 3D reconstruction rendered in a remote immersive visualization system. Particularly, in situations when there is a need to visualize, explore and perform tasks in inaccessible environments, too hazardous or distant. However, a remote visualization system requires the entire pipeline from 3D data acquisition to VR rendering satisfies the speed, throughput, and high visual realism. Mainly when using point-cloud, there is a fundamental quality difference between the acquired data of the physical world and the displayed data because of network latency and throughput limitations that negatively impact the sense of presence and provoke cybersickness. This thesis presents state-of-the-art research to address these problems by taking the human visual system as inspiration, from sensor data acquisition to VR rendering. The human visual system does not have a uniform vision across the field of view; It has the sharpest visual acuity at the center of the field of view. The acuity falls off towards the periphery. The peripheral vision provides lower resolution to guide the eye movements so that the central vision visits all the interesting crucial parts. As a first contribution, the thesis developed remote visualization strategies that utilize the acuity fall-off to facilitate the processing, transmission, buffering, and rendering in VR of 3D reconstructed scenes while simultaneously reducing throughput requirements and latency. As a second contribution, the thesis looked into attentional mechanisms to select and draw user engagement to specific information from the dynamic spatio-temporal environment. It proposed a strategy to analyze the remote scene concerning the 3D structure of the scene, its layout, and the spatial, functional, and semantic relationships between objects in the scene. The strategy primarily focuses on analyzing the scene with models the human visual perception uses. It sets a more significant proportion of computational resources on objects of interest and creates a more realistic visualization. As a supplementary contribution, A new volumetric point-cloud density-based Peak Signal-to-Noise Ratio (PSNR) metric is proposed to evaluate the introduced techniques. An in-depth evaluation of the presented systems, comparative examination of the proposed point cloud metric, user studies, and experiments demonstrated that the methods introduced in this thesis are visually superior while significantly reducing latency and throughput
    • …
    corecore