534 research outputs found

    Space-variant picture coding

    Get PDF
    PhDSpace-variant picture coding techniques exploit the strong spatial non-uniformity of the human visual system in order to increase coding efficiency in terms of perceived quality per bit. This thesis extends space-variant coding research in two directions. The first of these directions is in foveated coding. Past foveated coding research has been dominated by the single-viewer, gaze-contingent scenario. However, for research into the multi-viewer and probability-based scenarios, this thesis presents a missing piece: an algorithm for computing an additive multi-viewer sensitivity function based on an established eye resolution model, and, from this, a blur map that is optimal in the sense of discarding frequencies in least-noticeable- rst order. Furthermore, for the application of a blur map, a novel algorithm is presented for the efficient computation of high-accuracy smoothly space-variant Gaussian blurring, using a specialised filter bank which approximates perfect space-variant Gaussian blurring to arbitrarily high accuracy and at greatly reduced cost compared to the brute force approach of employing a separate low-pass filter at each image location. The second direction is that of artifi cially increasing the depth-of- field of an image, an idea borrowed from photography with the advantage of allowing an image to be reduced in bitrate while retaining or increasing overall aesthetic quality. Two synthetic depth of field algorithms are presented herein, with the desirable properties of aiming to mimic occlusion eff ects as occur in natural blurring, and of handling any number of blurring and occlusion levels with the same level of computational complexity. The merits of this coding approach have been investigated by subjective experiments to compare it with single-viewer foveated image coding. The results found the depth-based preblurring to generally be significantly preferable to the same level of foveation blurring

    The power of direct context as revealed by eye tracking:A model tracks relative attention to competing editorial and promotional content

    Get PDF
    Many previous studies on attention have ignored the eye-catching potential of 'direct context'—the entire promotional and editorial content an observer can view at the same time—in print media. In the current study, characteristics of 183 magazine advertisements and their direct context were coded systematically and linked to eye-tracking data, producing more than 19,000 observations. Expanding on earlier research, the authors focused on fixations within an advertisement during the first five seconds and attention paid to the combined main elements of an advertisement. Results showed that direct context diverted visual attention, especially when featuring multiple colors and large amounts of text

    Quality of Experience in Immersive Video Technologies

    Get PDF
    Over the last decades, several technological revolutions have impacted the television industry, such as the shifts from black & white to color and from standard to high-definition. Nevertheless, further considerable improvements can still be achieved to provide a better multimedia experience, for example with ultra-high-definition, high dynamic range & wide color gamut, or 3D. These so-called immersive technologies aim at providing better, more realistic, and emotionally stronger experiences. To measure quality of experience (QoE), subjective evaluation is the ultimate means since it relies on a pool of human subjects. However, reliable and meaningful results can only be obtained if experiments are properly designed and conducted following a strict methodology. In this thesis, we build a rigorous framework for subjective evaluation of new types of image and video content. We propose different procedures and analysis tools for measuring QoE in immersive technologies. As immersive technologies capture more information than conventional technologies, they have the ability to provide more details, enhanced depth perception, as well as better color, contrast, and brightness. To measure the impact of immersive technologies on the viewersâ QoE, we apply the proposed framework for designing experiments and analyzing collected subjectsâ ratings. We also analyze eye movements to study human visual attention during immersive content playback. Since immersive content carries more information than conventional content, efficient compression algorithms are needed for storage and transmission using existing infrastructures. To determine the required bandwidth for high-quality transmission of immersive content, we use the proposed framework to conduct meticulous evaluations of recent image and video codecs in the context of immersive technologies. Subjective evaluation is time consuming, expensive, and is not always feasible. Consequently, researchers have developed objective metrics to automatically predict quality. To measure the performance of objective metrics in assessing immersive content quality, we perform several in-depth benchmarks of state-of-the-art and commonly used objective metrics. For this aim, we use ground truth quality scores, which are collected under our subjective evaluation framework. To improve QoE, we propose different systems for stereoscopic and autostereoscopic 3D displays in particular. The proposed systems can help reducing the artifacts generated at the visualization stage, which impact picture quality, depth quality, and visual comfort. To demonstrate the effectiveness of these systems, we use the proposed framework to measure viewersâ preference between these systems and standard 2D & 3D modes. In summary, this thesis tackles the problems of measuring, predicting, and improving QoE in immersive technologies. To address these problems, we build a rigorous framework and we apply it through several in-depth investigations. We put essential concepts of multimedia QoE under this framework. These concepts not only are of fundamental nature, but also have shown their impact in very practical applications. In particular, the JPEG, MPEG, and VCEG standardization bodies have adopted these concepts to select technologies that were proposed for standardization and to validate the resulting standards in terms of compression efficiency

    A Comparative Study of Fixation Density Maps

    Get PDF
    International audienceFixation density maps (FDM) created from eye tracking experiments are widely used in image processing applications. The FDM are assumed to be reliable ground truths of human visual attention and as such one expects high similarity between FDM created in different laboratories. So far, no studies have analysed the degree of similarity between FDM from independent laboratories and the related impact on the applications. In this paper, we perform a thorough comparison of FDM from three independently conducted eye tracking experiments. We focus on the effect of presentation time and image content and evaluate the impact of the FDM differences on three applications: visual saliency modelling, image quality assessment, and image retargeting. It is shown that the FDM are very similar and that their impact on the applications is low. The individual experiment comparisons, however, are found to be significantly different, showing that inter-laboratory differences strongly depend on the experimental conditions of the laboratories. The FDM are publicly available to the research community

    Portable Eyetracking: A Study of Natural Eye Movements

    Get PDF
    Visual perception, operating below conscious awareness, effortlessly provides the experience of a rich representation of the environment, continuous in space and time. Conscious visual perception is made possible by the \u27foveal compromise,\u27 the combination of the high-acuity fovea and a sophisticated suite of eye movements. Our illusory visual experience cannot be understood by introspection, but monitoring eye movements lets us probe the processes of visual perception. Four tasks representing a wide range of complexity were used to explore visual perception; image quality judgments, map reading, model building, and hand-washing. Very short fixation durations were observed in all tasks, some as short as 33 msec. While some tasks showed little variation in eye movement metrics, differences in eye movement patterns and high-level strategies were observed in the model building and hand-washing tasks. Performance in the hand-washing task revealed a new type of eye movement. \u27Planful\u27 eye movements were made to objects well in advance of a subject\u27s interaction with the object. Often occurring in the middle of another task, they provide \u27overlapping\u27 temporal information about the environment providing a mechanism to produce our conscious visual experience

    JPEG backward compatible coding of omnidirectional images

    Get PDF
    Omnidirectional image and video, also known as 360 image and 360 video, are gaining in popularity with the recent growth in availability of cameras and displays that can cope with such type of content. As omnidirectional visual content represents a larger set of information about the scene, it typically requires a much larger volume of information. Efficient compression of such content is therefore important. In this paper, we review the state of the art in compression of omnidirectional visual content, and propose a novel approach to encode omnidirectional images in such a way that they are still viewable on legacy JPEG decoders

    VIDEO PREPROCESSING BASED ON HUMAN PERCEPTION FOR TELESURGERY

    Get PDF
    Video transmission plays a critical role in robotic telesurgery because of the high bandwidth and high quality requirement. The goal of this dissertation is to find a preprocessing method based on human visual perception for telesurgical video, so that when preprocessed image sequences are passed to the video encoder, the bandwidth can be reallocated from non-essential surrounding regions to the region of interest, ensuring excellent image quality of critical regions (e.g. surgical region). It can also be considered as a quality control scheme that will gracefully degrade the video quality in the presence of network congestion. The proposed preprocessing method can be separated into two major parts. First, we propose a time-varying attention map whose value is highest at the gazing point and falls off progressively towards the periphery. Second, we propose adaptive spatial filtering and the parameters of which are adjusted according to the attention map. By adding visual adaptation to the spatial filtering, telesurgical video data can be compressed efficiently because of the high degree of visual redundancy removal by our algorithm. Our experimental results have shown that with the proposed preprocessing method, over half of the bandwidth can be reduced while there is no significant visual effect for the observer. We have also developed an optimal parameter selecting algorithm, so that when the network bandwidth is limited, the overall visual distortion after preprocessing is minimized

    A computational model of visual attention.

    Get PDF
    Visual attention is a process by which the Human Visual System (HVS) selects most important information from a scene. Visual attention models are computational or mathematical models developed to predict this information. The performance of the state-of-the-art visual attention models is limited in terms of prediction accuracy and computational complexity. In spite of significant amount of active research in this area, modelling visual attention is still an open research challenge. This thesis proposes a novel computational model of visual attention that achieves higher prediction accuracy with low computational complexity. A new bottom-up visual attention model based on in-focus regions is proposed. To develop the model, an image dataset is created by capturing images with in-focus and out-of-focus regions. The Discrete Cosine Transform (DCT) spectrum of these images is investigated qualitatively and quantitatively to discover the key frequency coefficients that correspond to the in-focus regions. The model detects these key coefficients by formulating a novel relation between the in-focus and out-of-focus regions in the frequency domain. These frequency coefficients are used to detect the salient in-focus regions. The simulation results show that this attention model achieves good prediction accuracy with low complexity. The prediction accuracy of the proposed in-focus visual attention model is further improved by incorporating sensitivity of the HVS towards the image centre and the human faces. Moreover, the computational complexity is further reduced by using Integer Cosine Transform (ICT). The model is parameter tuned using the hill climbing approach to optimise the accuracy. The performance has been analysed qualitatively and quantitatively using two large image datasets with eye tracking fixation ground truth. The results show that the model achieves higher prediction accuracy with a lower computational complexity compared to the state-of-the-art visual attention models. The proposed model is useful in predicting human fixations in computationally constrained environments. Mainly it is useful in applications such as perceptual video coding, image quality assessment, object recognition and image segmentation
    corecore