268 research outputs found

    Foveation scalable video coding with automatic fixation selection

    Full text link

    Foveated Streaming of Real-Time Graphics

    Get PDF
    Remote rendering systems comprise powerful servers that render graphics on behalf of low-end client devices and stream the graphics as compressed video, enabling high end gaming and Virtual Reality on those devices. One key challenge with them is the amount of bandwidth required for streaming high quality video. Humans have spatially non-uniform visual acuity: We have sharp central vision but our ability to discern details rapidly decreases with angular distance from the point of gaze. This phenomenon called foveation can be taken advantage of to reduce the need for bandwidth. In this paper, we study three different methods to produce a foveated video stream of real-time rendered graphics in a remote rendered system: 1) foveated shading as part of the rendering pipeline, 2) foveation as post processing step after rendering and before video encoding, 3) foveated video encoding. We report results from a number of experiments with these methods. They suggest that foveated rendering alone does not help save bandwidth. Instead, the two other methods decrease the resulting video bitrate significantly but they also have different quality per bit and latency profiles, which makes them desirable solutions in slightly different situations.Peer reviewe

    Neural Representations for Sensory-Motor Control, II: Learning a Head-Centered Visuomotor Representation of 3-D Target Position

    Full text link
    A neural network model is described for how an invariant head-centered representation of 3-D target position can be autonomously learned by the brain in real time. Once learned, such a target representation may be used to control both eye and limb movements. The target representation is derived from the positions of both eyes in the head, and the locations which the target activates on the retinas of both eyes. A Vector Associative Map, or YAM, learns the many-to-one transformation from multiple combinations of eye-and-retinal position to invariant 3-D target position. Eye position is derived from outflow movement signals to the eye muscles. Two successive stages of opponent processing convert these corollary discharges into a. head-centered representation that closely approximates the azimuth, elevation, and vergence of the eyes' gaze position with respect to a cyclopean origin located between the eyes. YAM learning combines this cyclopean representation of present gaze position with binocular retinal information about target position into an invariant representation of 3-D target position with respect to the head. YAM learning can use a teaching vector that is externally derived from the positions of the eyes when they foveate the target. A YAM can also autonomously discover and learn the invariant representation, without an explicit teacher, by generating internal error signals from environmental fluctuations in which these invariant properties are implicit. YAM error signals are computed by Difference Vectors, or DVs, that are zeroed by the YAM learning process. YAMs may be organized into YAM Cascades for learning and performing both sensory-to-spatial maps and spatial-to-motor maps. These multiple uses clarify why DV-type properties are computed by cells in the parietal, frontal, and motor cortices of many mammals. YAMs are modulated by gating signals that express different aspects of the will-to-act. These signals transform a single invariant representation into movements of different speed (GO signal) and size (GRO signal), and thereby enable YAM controllers to match a planned action sequence to variable environmental conditions.National Science Foundation (IRI-87-16960, IRI-90-24877); Office of Naval Research (N00014-92-J-1309

    Space-variant picture coding

    Get PDF
    PhDSpace-variant picture coding techniques exploit the strong spatial non-uniformity of the human visual system in order to increase coding efficiency in terms of perceived quality per bit. This thesis extends space-variant coding research in two directions. The first of these directions is in foveated coding. Past foveated coding research has been dominated by the single-viewer, gaze-contingent scenario. However, for research into the multi-viewer and probability-based scenarios, this thesis presents a missing piece: an algorithm for computing an additive multi-viewer sensitivity function based on an established eye resolution model, and, from this, a blur map that is optimal in the sense of discarding frequencies in least-noticeable- rst order. Furthermore, for the application of a blur map, a novel algorithm is presented for the efficient computation of high-accuracy smoothly space-variant Gaussian blurring, using a specialised filter bank which approximates perfect space-variant Gaussian blurring to arbitrarily high accuracy and at greatly reduced cost compared to the brute force approach of employing a separate low-pass filter at each image location. The second direction is that of artifi cially increasing the depth-of- field of an image, an idea borrowed from photography with the advantage of allowing an image to be reduced in bitrate while retaining or increasing overall aesthetic quality. Two synthetic depth of field algorithms are presented herein, with the desirable properties of aiming to mimic occlusion eff ects as occur in natural blurring, and of handling any number of blurring and occlusion levels with the same level of computational complexity. The merits of this coding approach have been investigated by subjective experiments to compare it with single-viewer foveated image coding. The results found the depth-based preblurring to generally be significantly preferable to the same level of foveation blurring

    A quadtree driven image fusion quality assessment

    Full text link
    In this paper a new method to compute saliency of source images is presented. This work is an extension to universal quality index founded by Wang and Bovik and improved by Piella. It defines the saliency according to the change of topology of quadratic tree decomposition between source images and the fused image. The saliency function provides higher weight for the tree nodes that differs more in the fused image in terms topology. Quadratic tree decomposition provides an easy and systematic way to add a saliency factor based on the segmented regions in the images. <br /

    Mapping Information Flow in Sensorimotor Networks

    Get PDF
    Biological organisms continuously select and sample information used by their neural structures for perception and action, and for creating coherent cognitive states guiding their autonomous behavior. Information processing, however, is not solely an internal function of the nervous system. Here we show, instead, how sensorimotor interaction and body morphology can induce statistical regularities and information structure in sensory inputs and within the neural control architecture, and how the flow of information between sensors, neural units, and effectors is actively shaped by the interaction with the environment. We analyze sensory and motor data collected from real and simulated robots and reveal the presence of information structure and directed information flow induced by dynamically coupled sensorimotor activity, including effects of motor outputs on sensory inputs. We find that information structure and information flow in sensorimotor networks (a) is spatially and temporally specific; (b) can be affected by learning, and (c) can be affected by changes in body morphology. Our results suggest a fundamental link between physical embeddedness and information, highlighting the effects of embodied interactions on internal (neural) information processing, and illuminating the role of various system components on the generation of behavior

    Off-line Foveated Compression and Scene Perception: An Eye-Tracking Approach

    Get PDF
    With the continued growth of digital services offering storage and communication of pictorial information, the need to efficiently represent this information has become increasingly important, both from an information theoretic and a perceptual point of view. There has been a recent interest to design systems for efficient representation and compression of image and video data that take the features of the human visual system into account. One part of this thesis investigates whether knowledge about viewers' gaze positions as measured by an eye-tracker can be used to improve compression efficiency of digital video; regions not directly looked at by a number of previewers are lowpass filtered. This type of video manipulation is called off-line foveation. The amount of compression due to off-line foveation is assessed along with how it affects new viewers' gazing behavior as well as subjective quality. We found additional bitrate savings up to 50% (average 20%) due to off-line foveation prior to compression, without decreasing the subjective quality. In off-line foveation, it would be of great benefit to algorithmically predict where viewers look without having to perform eye-tracking measurements. In the first part of this thesis, new experimental paradigms combined with eye-tracking are used to understand the mechanisms behind gaze control during scene perception, thus investigating the prerequisites for such algorithms. Eye-movements are recorded from observers viewing contrast manipulated images depicting natural scenes under a neutral task. We report that image semantics, rather than the physical image content itself, largely dictates where people choose to look. Together with recent work on gaze prediction in video, the results in this thesis give only moderate support for successful applicability of algorithmic gaze prediction for off-line foveated video compression
    corecore