5 research outputs found

    Space-variant picture coding

    Get PDF
    PhDSpace-variant picture coding techniques exploit the strong spatial non-uniformity of the human visual system in order to increase coding efficiency in terms of perceived quality per bit. This thesis extends space-variant coding research in two directions. The first of these directions is in foveated coding. Past foveated coding research has been dominated by the single-viewer, gaze-contingent scenario. However, for research into the multi-viewer and probability-based scenarios, this thesis presents a missing piece: an algorithm for computing an additive multi-viewer sensitivity function based on an established eye resolution model, and, from this, a blur map that is optimal in the sense of discarding frequencies in least-noticeable- rst order. Furthermore, for the application of a blur map, a novel algorithm is presented for the efficient computation of high-accuracy smoothly space-variant Gaussian blurring, using a specialised filter bank which approximates perfect space-variant Gaussian blurring to arbitrarily high accuracy and at greatly reduced cost compared to the brute force approach of employing a separate low-pass filter at each image location. The second direction is that of artifi cially increasing the depth-of- field of an image, an idea borrowed from photography with the advantage of allowing an image to be reduced in bitrate while retaining or increasing overall aesthetic quality. Two synthetic depth of field algorithms are presented herein, with the desirable properties of aiming to mimic occlusion eff ects as occur in natural blurring, and of handling any number of blurring and occlusion levels with the same level of computational complexity. The merits of this coding approach have been investigated by subjective experiments to compare it with single-viewer foveated image coding. The results found the depth-based preblurring to generally be significantly preferable to the same level of foveation blurring

    Modeling Eye Tracking Data with Application to Object Detection

    Get PDF
    This research focuses on enhancing computer vision algorithms using eye tracking and visual saliency. Recent advances in eye tracking device technology have enabled large scale collection of eye tracking data, without affecting viewer experience. As eye tracking data is biased towards high level image and video semantics, it provides a valuable prior for object detection in images and object extraction in videos. We specifically explore the following problems in the thesis: 1) eye tracking and saliency enhanced object detection, 2) eye tracking assisted object extraction in videos, and 3) role of object co-occurrence and camera focus in visual attention modeling.Since human attention is biased towards faces and text, in the first work we propose an approach to isolate face and text regions in images by analyzing eye tracking data from multiple subjects. Eye tracking data is clustered and region labels are predicted using a Markov random field model. In the second work, we study object extraction in videos using eye tracking prior. We propose an algorithm to extract dominant visual tracks in eye tracking data from multiple subjects by solving a linear assignment problem. Visual tracks localize object search and we propose a novel mixed graph association framework, inferred by binary integer linear programming. In the final work, we address the problem of predicting where people look in images. We specifically explore the importance of scene context in the form of object co-occurrence and camera focus. The proposed model extracts low-, mid- and high-level and scene context features and uses a regression framework to predict visual attention map. In all the above cases, extensive experimental results show that the proposed methods outperform current state-of-the-art

    Bio-inspired foveal and peripheral visual sensing for saliency-based decision making in robotics

    Get PDF
    Computer vision is an area of research that has grown at immense speed in the last few decades, tackling problems towards scene understanding from very diverse fronts, such as image classification, object detection, localization, mapping and tracking. It has also been long understood that there are very valuable lessons to learn from biology and to be applied to this research field, where the human visual system is very likely the most studied brain mechanism. The eye foveation system is a very good example of such lessons, since both machines and animals often face a similar dilemma; to prioritize visual areas of interest to faster process information, given limited computing power and from a field of view that is too wide to be simultaneously attended. While extensive models of artificial foveation have been presented, the re-emerging area of machine learning with deep neural networks has opened the question into how these two approaches can contribute to each other. Novel deep learning models often rely on the availability of substantial computing power, but areas of application face strict constraints, a good example are unmanned aerial vehicles, which in order to be autonomous should lift and power all their computing equipment. In this work it is studied how applying a foveation principle to down-scale images can be used to reduce the number of operations required for object detection, and compare its effect to normally down-sampled images, given the prevalent number of operations by Convolutional Neural Network (CNN) layers. Foveation requires prior knowledge of regions of interest to center the fovea, this point in question is addressed by a merging of bottom-up saliency and top-down feedback of objects that the CNN has been trained to detect. Albeit saliency models have also been studied extensively in the last couple of decades, most often comparing their performance to human observer datasets, the question remains open into how they fit in wider information processing paradigms and into functional representations of the human brain. It is proposed here an information flow scheme that encompasses these principles. Finally, to give to the model the capacity to operate coherently in the time domain, it adapts a representation of a well-established theory of the decision-making process that takes place in the basal ganglia region of the brain. The behaviour of this representation is then tested against human observer's data in an omnidirectional field of view, where the importance of selecting the most contextually relevant region of interest in each time-step is highlighted

    Blickpunktabhängige Computergraphik

    Get PDF
    Contemporary digital displays feature multi-million pixels at ever-increasing refresh rates. Reality, on the other hand, provides us with a view of the world that is continuous in space and time. The discrepancy between viewing the physical world and its sampled depiction on digital displays gives rise to perceptual quality degradations. By measuring or estimating where we look, gaze-contingent algorithms aim at exploiting the way we visually perceive to remedy visible artifacts. This dissertation presents a variety of novel gaze-contingent algorithms and respective perceptual studies. Chapter 4 and 5 present methods to boost perceived visual quality of conventional video footage when viewed on commodity monitors or projectors. In Chapter 6 a novel head-mounted display with real-time gaze tracking is described. The device enables a large variety of applications in the context of Virtual Reality and Augmented Reality. Using the gaze-tracking VR headset, a novel gaze-contingent render method is described in Chapter 7. The gaze-aware approach greatly reduces computational efforts for shading virtual worlds. The described methods and studies show that gaze-contingent algorithms are able to improve the quality of displayed images and videos or reduce the computational effort for image generation, while display quality perceived by the user does not change.Moderne digitale Bildschirme ermöglichen immer höhere Auflösungen bei ebenfalls steigenden Bildwiederholraten. Die Realität hingegen ist in Raum und Zeit kontinuierlich. Diese Grundverschiedenheit führt beim Betrachter zu perzeptuellen Unterschieden. Die Verfolgung der Aug-Blickrichtung ermöglicht blickpunktabhängige Darstellungsmethoden, die sichtbare Artefakte verhindern können. Diese Dissertation trägt zu vier Bereichen blickpunktabhängiger und wahrnehmungstreuer Darstellungsmethoden bei. Die Verfahren in Kapitel 4 und 5 haben zum Ziel, die wahrgenommene visuelle Qualität von Videos für den Betrachter zu erhöhen, wobei die Videos auf gewöhnlicher Ausgabehardware wie z.B. einem Fernseher oder Projektor dargestellt werden. Kapitel 6 beschreibt die Entwicklung eines neuartigen Head-mounted Displays mit Unterstützung zur Erfassung der Blickrichtung in Echtzeit. Die Kombination der Funktionen ermöglicht eine Reihe interessanter Anwendungen in Bezug auf Virtuelle Realität (VR) und Erweiterte Realität (AR). Das vierte und abschließende Verfahren in Kapitel 7 dieser Dissertation beschreibt einen neuen Algorithmus, der das entwickelte Eye-Tracking Head-mounted Display zum blickpunktabhängigen Rendern nutzt. Die Qualität des Shadings wird hierbei auf Basis eines Wahrnehmungsmodells für jeden Bildpixel in Echtzeit analysiert und angepasst. Das Verfahren hat das Potenzial den Berechnungsaufwand für das Shading einer virtuellen Szene auf ein Bruchteil zu reduzieren. Die in dieser Dissertation beschriebenen Verfahren und Untersuchungen zeigen, dass blickpunktabhängige Algorithmen die Darstellungsqualität von Bildern und Videos wirksam verbessern können, beziehungsweise sich bei gleichbleibender Bildqualität der Berechnungsaufwand des bildgebenden Verfahrens erheblich verringern lässt
    corecore