494 research outputs found
Space-variant spatio-temporal filtering of video for gaze visualization and perceptual learning
Dorr, M., Jarodzka, H., & Barth, E. (2010). Space-variant spatio-temporal filtering of video for gaze visualization and perceptual learning. In C. Morimoto & H. Instance (Eds.), Proceedings of the 2010 Symposium on Eye Tracking Research & Applications ETRA ’10 (pp. 307-314). New York, NY: ACM.We introduce an algorithm for space-variant filtering of video based on a spatio-temporal Laplacian pyramid and use this algorithm to
render videos in order to visualize prerecorded eye movements. Spatio-temporal contrast and colour saturation are reduced as a function of distance to the nearest gaze point of regard, i.e. non-
fixated, distracting regions are filtered out, whereas fixated image regions remain unchanged. Results of an experiment in which the eye movements of an expert on instructional videos are visualized with this algorithm, so that the gaze of novices is guided to relevant image locations. Results show that this visualization technique facilitates the novices’ perceptual learning
Applied eye tracking research
Jarodzka, H. (2010, 12 November). Applied eye tracking research. Presentation and Labtour for Vereniging Gewone Leden in oprichting (VGL i.o.), Heerlen, The Netherlands: Open University of the Netherlands.This presentation was part of a lab tour
Role and Development of Perceptual Skills in Medical Education
Jarodzka, H., Balslev, T., Holmqvist, K., Nyström, M., Scheiter, K., Gerjets, P., & Eika, B. (2010, May). Role and Development of Perceptual Skills in Medical Education. The Scandinavian Workshop on Applied Eye-Tracking (SWAET), Lund, Sweden.This talk is about a study in which we tried to convey perceptual skills in clinical reasoning on patient video cases
Space-variant picture coding
PhDSpace-variant picture coding techniques exploit the strong spatial non-uniformity of
the human visual system in order to increase coding efficiency in terms of perceived quality
per bit. This thesis extends space-variant coding research in two directions. The first of
these directions is in foveated coding. Past foveated coding research has been dominated
by the single-viewer, gaze-contingent scenario. However, for research into the multi-viewer
and probability-based scenarios, this thesis presents a missing piece: an algorithm for computing
an additive multi-viewer sensitivity function based on an established eye resolution
model, and, from this, a blur map that is optimal in the sense of discarding frequencies in
least-noticeable- rst order. Furthermore, for the application of a blur map, a novel algorithm
is presented for the efficient computation of high-accuracy smoothly space-variant
Gaussian blurring, using a specialised filter bank which approximates perfect space-variant
Gaussian blurring to arbitrarily high accuracy and at greatly reduced cost compared to
the brute force approach of employing a separate low-pass filter at each image location.
The second direction is that of artifi cially increasing the depth-of- field of an image, an
idea borrowed from photography with the advantage of allowing an image to be reduced
in bitrate while retaining or increasing overall aesthetic quality. Two synthetic depth of field algorithms are presented herein, with the desirable properties of aiming to mimic
occlusion eff ects as occur in natural blurring, and of handling any number of blurring
and occlusion levels with the same level of computational complexity. The merits of this
coding approach have been investigated by subjective experiments to compare it with
single-viewer foveated image coding. The results found the depth-based preblurring to
generally be significantly preferable to the same level of foveation blurring
Blickpunktabhängige Computergraphik
Contemporary digital displays feature multi-million pixels at ever-increasing refresh rates. Reality, on the other hand, provides us with a view of the world that is continuous in space and time. The discrepancy between viewing the physical world and its sampled depiction on digital displays gives rise to perceptual quality degradations. By measuring or estimating where we look, gaze-contingent algorithms aim at exploiting the way we visually perceive to remedy visible artifacts. This dissertation presents a variety of novel gaze-contingent algorithms and respective perceptual studies. Chapter 4 and 5 present methods to boost perceived visual quality of conventional video footage when viewed on commodity monitors or projectors. In Chapter 6 a novel head-mounted display with real-time gaze tracking is described. The device enables a large variety of applications in the context of Virtual Reality and Augmented Reality. Using the gaze-tracking VR headset, a novel gaze-contingent render method is described in Chapter 7. The gaze-aware approach greatly reduces computational efforts for shading virtual worlds. The described methods and studies show that gaze-contingent algorithms are able to improve the quality of displayed images and videos or reduce the computational effort for image generation, while display quality perceived by the user does not change.Moderne digitale Bildschirme ermöglichen immer höhere Auflösungen bei ebenfalls steigenden Bildwiederholraten. Die Realität hingegen ist in Raum und Zeit kontinuierlich. Diese Grundverschiedenheit führt beim Betrachter zu perzeptuellen Unterschieden. Die Verfolgung der Aug-Blickrichtung ermöglicht blickpunktabhängige Darstellungsmethoden, die sichtbare Artefakte verhindern können. Diese Dissertation trägt zu vier Bereichen blickpunktabhängiger und wahrnehmungstreuer Darstellungsmethoden bei. Die Verfahren in Kapitel 4 und 5 haben zum Ziel, die wahrgenommene visuelle Qualität von Videos für den Betrachter zu erhöhen, wobei die Videos auf gewöhnlicher Ausgabehardware wie z.B. einem Fernseher oder Projektor dargestellt werden. Kapitel 6 beschreibt die Entwicklung eines neuartigen Head-mounted Displays mit Unterstützung zur Erfassung der Blickrichtung in Echtzeit. Die Kombination der Funktionen ermöglicht eine Reihe interessanter Anwendungen in Bezug auf Virtuelle Realität (VR) und Erweiterte Realität (AR). Das vierte und abschließende Verfahren in Kapitel 7 dieser Dissertation beschreibt einen neuen Algorithmus, der das entwickelte Eye-Tracking Head-mounted Display zum blickpunktabhängigen Rendern nutzt. Die Qualität des Shadings wird hierbei auf Basis eines Wahrnehmungsmodells für jeden Bildpixel in Echtzeit analysiert und angepasst. Das Verfahren hat das Potenzial den Berechnungsaufwand für das Shading einer virtuellen Szene auf ein Bruchteil zu reduzieren. Die in dieser Dissertation beschriebenen Verfahren und Untersuchungen zeigen, dass blickpunktabhängige Algorithmen die Darstellungsqualität von Bildern und Videos wirksam verbessern können, beziehungsweise sich bei gleichbleibender Bildqualität der Berechnungsaufwand des bildgebenden Verfahrens erheblich verringern lässt
Spatial Interaction for Immersive Mixed-Reality Visualizations
Growing amounts of data, both in personal and professional settings, have caused an increased interest in data visualization and visual analytics.
Especially for inherently three-dimensional data, immersive technologies such as virtual and augmented reality and advanced, natural interaction techniques have been shown to facilitate data analysis.
Furthermore, in such use cases, the physical environment often plays an important role, both by directly influencing the data and by serving as context for the analysis.
Therefore, there has been a trend to bring data visualization into new, immersive environments and to make use of the physical surroundings, leading to a surge in mixed-reality visualization research.
One of the resulting challenges, however, is the design of user interaction for these often complex systems.
In my thesis, I address this challenge by investigating interaction for immersive mixed-reality visualizations regarding three core research questions:
1) What are promising types of immersive mixed-reality visualizations, and how can advanced interaction concepts be applied to them?
2) How does spatial interaction benefit these visualizations and how should such interactions be designed?
3) How can spatial interaction in these immersive environments be analyzed and evaluated?
To address the first question, I examine how various visualizations such as 3D node-link diagrams and volume visualizations can be adapted for immersive mixed-reality settings and how they stand to benefit from advanced interaction concepts.
For the second question, I study how spatial interaction in particular can help to explore data in mixed reality.
There, I look into spatial device interaction in comparison to touch input, the use of additional mobile devices as input controllers, and the potential of transparent interaction panels.
Finally, to address the third question, I present my research on how user interaction in immersive mixed-reality environments can be analyzed directly in the original, real-world locations, and how this can provide new insights.
Overall, with my research, I contribute interaction and visualization concepts, software prototypes, and findings from several user studies on how spatial interaction techniques can support the exploration of immersive mixed-reality visualizations.Zunehmende Datenmengen, sowohl im privaten als auch im beruflichen Umfeld, führen zu einem zunehmenden Interesse an Datenvisualisierung und visueller Analyse.
Insbesondere bei inhärent dreidimensionalen Daten haben sich immersive Technologien wie Virtual und Augmented Reality sowie moderne, natürliche Interaktionstechniken als hilfreich für die Datenanalyse erwiesen.
Darüber hinaus spielt in solchen Anwendungsfällen die physische Umgebung oft eine wichtige Rolle, da sie sowohl die Daten direkt beeinflusst als auch als Kontext für die Analyse dient.
Daher gibt es einen Trend, die Datenvisualisierung in neue, immersive Umgebungen zu bringen und die physische Umgebung zu nutzen, was zu einem Anstieg der Forschung im Bereich Mixed-Reality-Visualisierung geführt hat.
Eine der daraus resultierenden Herausforderungen ist jedoch die Gestaltung der Benutzerinteraktion für diese oft komplexen Systeme.
In meiner Dissertation beschäftige ich mich mit dieser Herausforderung, indem ich die Interaktion für immersive Mixed-Reality-Visualisierungen im Hinblick auf drei zentrale Forschungsfragen untersuche:
1) Was sind vielversprechende Arten von immersiven Mixed-Reality-Visualisierungen, und wie können fortschrittliche Interaktionskonzepte auf sie angewendet werden?
2) Wie profitieren diese Visualisierungen von räumlicher Interaktion und wie sollten solche Interaktionen gestaltet werden?
3) Wie kann räumliche Interaktion in diesen immersiven Umgebungen analysiert und ausgewertet werden?
Um die erste Frage zu beantworten, untersuche ich, wie verschiedene Visualisierungen wie 3D-Node-Link-Diagramme oder Volumenvisualisierungen für immersive Mixed-Reality-Umgebungen angepasst werden können und wie sie von fortgeschrittenen Interaktionskonzepten profitieren.
Für die zweite Frage untersuche ich, wie insbesondere die räumliche Interaktion bei der Exploration von Daten in Mixed Reality helfen kann.
Dabei betrachte ich die Interaktion mit räumlichen Geräten im Vergleich zur Touch-Eingabe, die Verwendung zusätzlicher mobiler Geräte als Controller und das Potenzial transparenter Interaktionspanels.
Um die dritte Frage zu beantworten, stelle ich schließlich meine Forschung darüber vor, wie Benutzerinteraktion in immersiver Mixed-Reality direkt in der realen Umgebung analysiert werden kann und wie dies neue Erkenntnisse liefern kann.
Insgesamt trage ich mit meiner Forschung durch Interaktions- und Visualisierungskonzepte, Software-Prototypen und Ergebnisse aus mehreren Nutzerstudien zu der Frage bei, wie räumliche Interaktionstechniken die Erkundung von immersiven Mixed-Reality-Visualisierungen unterstützen können
Foveated Rendering Techniques in Modern Computer Graphics
Foveated rendering coupled with eye-tracking has the potential to dramatically accelerate interactive 3D graphics with minimal loss of perceptual detail.
I have developed a new foveated rendering technique: Kernel Foveated Rendering (KFR), which parameterizes foveated rendering by embedding polynomial kernel functions in log-polar mapping. This GPU-driven technique uses parameterized foveation that mimics the distribution of photoreceptors in the human retina. I present a two-pass kernel foveated rendering pipeline that maps well onto modern GPUs. In the first pass, I compute the kernel log-polar transformation and render to a reduced-resolution buffer. In the second pass, I have carried out the inverse-log-polar transformation with anti-aliasing to map the reduced-resolution rendering to the full-resolution screen. I carry out user studies to empirically identify the KFR parameters and observe a 2.8X-3.2X speedup in rendering on 4K displays. The eye-tracking-guided kernel foveated rendering can resolve the mutually conflicting goals of interactive rendering and perceptual realism
Recommended from our members
Learning human activities and poses with interconnected data sources
Understanding human actions and poses in images or videos is a challenging problem in computer vision. There are different topics related to this problem such as action recognition, pose estimation, human-object interaction, and activity detection. Knowledge of actions and poses could benefit many applications, including video search, surveillance, auto-tagging, event detection, and human-computer interfaces. To understand humans' actions and poses, we need to address several challenges. First, humans are able to perform an enormous amount of poses. For example, simply to move forward, we can do crawling, walking, running, and sprinting. These poses all look different and require examples to cover these variations. Second, the appearance of a person's pose changes when looking from different viewing angles. The learned action model needs to cover the variations from different views. Third, many actions involve interactions between people and other objects, so we need to consider the appearance change corresponding to that object as well. Fourth, collecting such data for learning is difficult and expensive. Last, even if we can learn a good model for an action, to localize when and where the action happens in a long video remains a difficult problem due to the large search space. My key idea to alleviate these obstacles in learning humans' actions and poses is to discover the underlying patterns that connect the information from different data sources. Why will there be underlying patterns? The intuition is that all people share the same articulated physical structure. Though we can change our pose, there are common regulations that limit how our pose can be and how it can move over time. Therefore, all types of human data will follow these rules and they can serve as prior knowledge or regularization in our learning framework. If we can exploit these tendencies, we are able to extract additional information from data and use them to improve learning of humans' actions and poses. In particular, we are able to find patterns for how our pose could vary over time, how our appearance looks in a specific view, how our pose is when we are interacting with objects with certain properties, and how part of our body configuration is shared across different poses. If we could learn these patterns, they can be used to interconnect and extrapolate the knowledge between different data sources. To this end, I propose several new ways to connect human activity data. First, I show how to connect snapshot images and videos by exploring the patterns of how our pose could change over time. Building on this idea, I explore how to connect humans' poses across multiple views by discovering the correlations between different poses and the latent factors that affect the viewpoint variations. In addition, I consider if there are also patterns connecting our poses and nearby objects when we are interacting with them. Furthermore, I explore how we can utilize the predicted interaction as a cue to better address existing recognition problems including image re-targeting and image description generation. Finally, after learning models effectively incorporating these patterns, I propose a robust approach to efficiently localize when and where a complex action happens in a video sequence. The variants of my proposed approaches offer a good trade-off between computational cost and detection accuracy. My thesis exploits various types of underlying patterns in human data. The discovered structure is used to enhance the understanding of humans' actions and poses. By my proposed methods, we are able to 1) learn an action with very few snapshots by connecting them to a pool of label-free videos, 2) infer the pose for some views even without any examples by connecting the latent factors between different views, 3) predict the location of an object that a person is interacting with independent of the type and appearance of that object, then use the inferred interaction as a cue to improve recognition, and 4) localize an action in a complex long video. These approaches improve existing frameworks for understanding humans' actions and poses without extra data collection cost and broaden the problems that we can tackle.Computer Science
Eye-tracking - Research in Learning and Instruction
Jarodzka, H. (2010, 15 November). Eye-tracking - Research in Learning and Instruction. Workshop given at the Open University of the Netherlands, Heerlen, The Netherlands.This is a presentation given during a workshop on eye tracking in learning and instruction
- …