95 research outputs found

    Towards Understanding and Expanding Locomotion in Physical and Virtual Realities

    Get PDF
    Among many virtual reality interactions, the locomotion dilemma remains a significant impediment to achieving an ideal immersive experience. The physical limitations of tracked space make it impossible to naturally explore theoretically boundless virtual environments with a one-to-one mapping. Synthetic techniques like teleportation and flying often induce simulator sickness and break the sense of presence. Therefore, natural walking is the most favored form of locomotion. Redirected walking offers a more natural and intuitive way for users to navigate vast virtual spaces efficiently. However, existing techniques either lead to simulator sickness due to visual and vestibular mismatch or detract users from the immersive experience that virtual reality aims to provide. This research presents innovative techniques and applications to enhance the user experience by expanding walkable, physical space in Virtual Reality. The thesis includes three main contributions. The first contribution proposes a mobile application that uses markerless Augmented Reality to allow users to explore a life-sized virtual library through a divide-and-rule approach. The second contribution presents a subtle redirected walking technique based on inattentional blindness, using dynamic foveated rendering and natural visual suppressions like blinks and saccades. Finally, the third contribution introduces a novel redirected walking solution that leverages a deep neural network, to predict saccades in real-time and eliminate the hardware requirements for eye-tracking. Overall, this thesis offers valuable contributions to human-computer interaction, investigating novel approaches to solving the locomotion dilemma. The proposed solutions were evaluated through extensive user studies, demonstrating their effectiveness and applicability in real-world scenarios like training simulations and entertainment

    Real-time interactive visualization of large networks on a tiled display system

    Get PDF
    This paper introduces a methodology for visualizing large real-world (social) network data on a high-resolution tiled display system. Advances in network drawing algorithms enabled real-time visualization and interactive exploration of large real-world networks. However, visualization on a typical desktop monitor remains challenging due to the limited amount of screen space and ever increasing size of real-world datasets.To solve this problem, we propose an integrated approach that employs state-of-the-art network visual-ization algorithms on a tiled display system consisting of multiple screens. Key to our approach is to use the machine's graphics processing units (GPUs) to their fullest extent, in order to ensure an interactive setting with real-time visualization. To realize this, we extended a recent GPU-based implementation of a force-directed graph layout algorithm to multiple GPUs and combined this with a distributed rendering approach in which each graphics card in the tiled display system renders precisely the part of the network to be displayed on the monitors attached to it.Our evaluation of the approach on a 12-screen 25 megapixels tiled display system with three GPUs, demonstrates interactive performance at 60 frames per second for real-world networks with tens of thousands of nodes and edges. This constitutes a performance improvement of approximately 4 times over a single GPU implementation. All the software developed to implement our tiled visualization approach, including the multi-GPU network layout, rendering, display and interaction components, are made available as open-source software.Computer Systems, Imagery and Medi

    Shader optimization and specialization

    Get PDF
    In the field of real-time graphics for computer games, performance has a significant effect on the player’s enjoyment and immersion. Graphics processing units (GPUs) are hardware accelerators that run small parallelized shader programs to speed up computationally expensive rendering calculations. This thesis examines optimizing shader programs and explores ways in which data patterns on both the CPU and GPU can be analyzed to automatically speed up rendering in games. Initially, the effect of traditional compiler optimizations on shader source-code was explored. Techniques such as loop unrolling or arithmetic reassociation provided speed-ups on several devices, but different GPU hardware responded differently to each set of optimizations. Analyzing execution traces from numerous popular PC games revealed that much of the data passed from CPU-based API calls to GPU-based shaders is either unused, or remains constant. A system was developed to capture this constant data and fold it into the shaders’ source-code. Re-running the game’s rendering code using these specialized shader variants resulted in performance improvements in several commercial games without impacting their visual quality

    Pattern Recognition

    Get PDF
    Pattern recognition is a very wide research field. It involves factors as diverse as sensors, feature extraction, pattern classification, decision fusion, applications and others. The signals processed are commonly one, two or three dimensional, the processing is done in real- time or takes hours and days, some systems look for one narrow object class, others search huge databases for entries with at least a small amount of similarity. No single person can claim expertise across the whole field, which develops rapidly, updates its paradigms and comprehends several philosophical approaches. This book reflects this diversity by presenting a selection of recent developments within the area of pattern recognition and related fields. It covers theoretical advances in classification and feature extraction as well as application-oriented works. Authors of these 25 works present and advocate recent achievements of their research related to the field of pattern recognition

    Low power CMOS vision sensor for foreground segmentation

    Get PDF
    This thesis focuses on the design of a top-ranked algorithm for background subtraction, the Pixel Adaptive Based Segmenter (PBAS), for its mapping onto a CMOS vision sensor on the focal plane processing. The redesign of PBAS into its hardware oriented version, HO-PBAS, has led to a less number of memories per pixel, along with a simpler overall model, yet, resulting in an acceptable loss of accuracy with respect to its counterpart on CPU. This thesis features two CMOS vision sensors. The first one, HOPBAS1K, has laid out a 24 x 56 pixel array onto a miniasic chip in standard 180 nm CMOS technology. The second one, HOPBAS10K, features an array of 98 x 98 pixels in standard 180 nm CMOS technology too. The second chip fixes some issues found in the first chip, and provides good hardware and background performance metrics

    Immersive Automotive Stereo Vision

    Get PDF
    Kürzlich wurde das erste In-Car Augmented Reality (AR) System eingeführt. Das System beinhaltet das Rendern von verschiedenen 3D Objekten auf einem Live-Video, welches auf einem Zentraldisplay in der Mittelkonsole des Fahrzeuges angezeigt wird. Ziel dieser Arbeit ist es ein System zu entwickeln, welches nicht nur 2D-Videos augmentieren kann, sondern eine 3D-Rekonstruktion der aktuellen Fahrzeugumgebung erstellen kann. Dies ermöglicht eine Vielzahl von verschiedenen Anwendungen, u.a. die Anzeige dieses 3D-Scans auf einem Head-mounted Display (HMD) als Teil einer Mixed Reality (MR) Anwendung. Eine MR-Anwendung bedarf einer überzeugenden und immersiven Darstellung der Umgebung mit einer hohen Renderfrequenz. Wir beschränken uns auf eine einzelne Front-Stereokamera, welche vorne am Auto verbaut oder montiert ist, um diese Aufgabe zu bewältigen. Hierzu fusionieren wir die Stereomessungen temporär. Zuerst analysieren wir von Grund auf die Effekte der temporalen Stereofusion. Wir schätzen die erreichbare Genauigkeit ab und zeigen Einschränkungen der temporalen Fusion und unseren Annahmen auf. Wir leiten außerdem ein 1D Extended Information Filter und ein 3D Extended Kalman Filter her, um Stereomessungen temporär zu vereinen. Die Filter verbesserten den Tiefenfehler in Simulationen wesentlich. Die Ergebnisse der Analyse integrieren wir in ein neuartiges 3D-Rekonstruktions- Framework, bei dem jeder Punkt mit einem Filter modelliert wird. Das sog. “Warping” von Pixeln von einem Bild zu einem anderen Bild ermöglicht die temporäre Fusion von Messungen nach einem Clustering-Schritt, welcher uns erlaubt verschiedene Tiefenebenen pro Pixel gesondert zu betrachten. Das Framework funktioniert als punkt-basierte Rekonstruktion oder alternativ als mesh-basierte Erweiterung. Hierfür triangulieren wir Tiefenbilder, um die 3DSzene nur mit RGB- und Tiefenbildern als Input auf der GPU zu rendern. Wir können die Eigenschaften von urbanen Szenen und der Kamerabewegung ausnutzen, um Pixel zu identifizieren und zu rendern, welche nicht mehr in zukünftigen Frames beobachtet werden. Das ermöglicht uns diesen Teil der Szene in der größten beobachteten Auflösung zu rekonstruieren. Solche Randpixel formen einen Schlauch (“Tube”) über mehrere Frames, weshalb wir dieses Mesh als Tube Mesh bezeichnen. Unser Framework erlaubt es uns auch die rechenintensiven Filter-Propagationen komplett auf die GPU auszulagern. DesWeiteren demonstrieren wir ein Verfahren, um einen vollen, dynamischen, virtuellen Himmel mithilfe der gleichen Kamera zu erstellen, welcher ergänzend zu der 3D-Szenenrekonstruktion als Hintergrund gezeigt werden kann. Wir evaluieren unsere Methoden gegen andere Verfahren in einem umfangreichen Benchmark auf dem populären “KITTI Visual Odometry”-Datensatz und dem synthethischen SYNTHIA-Datensatz. Neben Stereofehlern im Bild vergleichen wir auch die Performanz der Verfahren für die Rekonstruktion von bestimmten Strukturen in den Referenz-Tiefenbildern, sowie ihre Fähigkeit die Erscheinung der 3D-Szene aus unterschiedlichen Blickwinkeln vorherzusagen auf dem SYNTHIA-Datensatz. Unsere Methode zeigt signifikante Verbesserungen des Disparitätsfehlers sowie des Bildfehlers aus unterschiedlichen Blickwinkeln. Außerdem erzielen wir eine so hohe Rendergeschwindigkeit, dass die Anforderung der Bildwiederholrate von modernen HMDs erfüllt wird. Zum Schluss zeigen wir Herausforderungen in der Evaluation auf, untersuchen die Auswirkungen des Weglassens einzelner Komponenten unseres Frameworks und schließen mit einer qualitativen Demonstration von unterschiedlichen Datensätzen ab, inklusive der Diskussion von Fehlerfällen.Recently, the first in-car augmented reality (AR) system has been introduced to the market. It features various virtual 3D objects drawn on top of a 2D live video feed, which is displayed on a central display inside the vehicle. Our goal with this thesis is to develop an approach that allows to not only augment a 2D video, but to reconstruct a 3D scene of the surrounding driving environment of the vehicle. This opens up various possibilities including the display of this 3D scan on a head-mounted display (HMD) as part of a Mixed Reality (MR) application, which requires a convincing and immersive visualization of the surroundings with high rendering speed. To accomplish this task, we limit ourselves to the use of a single front-mounted stereo camera on a vehicle and fuse stereo measurements temporally. First, we analyze the effects of temporal stereo fusion thoroughly. We estimate the theoretically achievable accuracy and highlight limitations of temporal fusion and our assumptions. We also derive a 1D extended information filter and a 3D extended Kalman filter to fuse measurements temporally, which substantially improves the depth error in our simulations. We integrate these results in a novel dense 3D reconstruction framework, which models each point as a probabilistic filter. Projecting 3D points to the newest image allows us to fuse measurements temporally after a clustering stage, which also gives us the ability to handle multiple depth layers per pixel. The 3D reconstruction framework is point-based, but it also has a mesh-based extension. For that, we leverage a novel depth image triangulation method to render the scene on the GPU using only RGB and depth images as input. We can exploit the nature of urban scenery and the vehicle movement by first identifying and then rendering pixels of the previous stereo camera frame that are no longer seen in the current frame. These pixels at the previous image border form a tube over multiple frames, which we call a tube mesh, and have the highest possible observable resolution. We are also able to offload intensive filter propagation computations completely to the GPU. Furthermore, we demonstrate a way to create a dense, dynamic virtual sky background from the same camera to accompany our reconstructed 3D scene. We evaluate our method against other approaches in an extensive benchmark on the popular KITTI visual odometry dataset and on the synthetic SYNTHIA dataset. Besides stereo error metrics in image space, we also compare how the approaches perform regarding the available depth structure in the reference depth image and in their ability to predict the appearance of the scene from different viewing angles on SYNTHIA. Our method shows significant improvements in terms of disparity and view prediction errors. We also achieve such a high rendering speed that we can fulfill the framerate requirements of modern HMDs. Finally, we highlight challenges in the evaluation, perform ablation studies of our framework and conclude with a qualitative showcase on different datasets including the discussion of failure cases

    Молодежь и современные информационные технологии : сборник трудов XVIII Международной научно-практической конференции студентов, аспирантов и молодых учёных, 22-26 марта 2021 г., г. Томск

    Get PDF
    Сборник содержит доклады, представленные на XVIII Международной научно-практической конференции студентов, аспирантов и молодых ученых «Молодежь и современные информационные технологии», прошедшей в Томском политехническом университете на базе Инженерной школы информационных технологий и робототехники. Материалы сборника отражают доклады студентов, аспирантов и молодых ученых, принятые к обсуждению на секциях: «Искусственный интеллект и машинное обучение», ««Цифровизация, IT и цифровая экономика», «Дизайн и компьютерная графика», «Виртуальная и дополненная реальность», «Технология больших данных в индустрии», «Мехатроника и робототехника», «Автоматизация технологических процессов и производств». Сборник предназначен для специалистов в области информационных технологий, студентов и аспирантов соответствующих специальностей

    Patterns and Pattern Languages for Mobile Augmented Reality

    Get PDF
    Mixed Reality is a relatively new field in computer science which uses technology as a medium to provide modified or enhanced views of reality or to virtually generate a new reality. Augmented Reality is a branch of Mixed Reality which blends the real-world as viewed through a computer interface with virtual objects generated by a computer. The 21st century commodification of mobile devices with multi-core Central Processing Units, Graphics Processing Units, high definition displays and multiple sensors controlled by capable Operating Systems such as Android and iOS means that Mobile Augmented Reality applications have become increasingly feasible. Mobile Augmented Reality is a multi-disciplinary field requiring a synthesis of many technologies such as computer graphics, computer vision, machine learning and mobile device programming while also requiring theoretical knowledge of diverse fields such as Linear Algebra, Projective and Differential Geometry, Probability and Optimisation. This multi-disciplinary nature has led to a fragmentation of knowledge into various specialisations, making it difficult to integrate different solution components into a coherent architecture. Software design patterns provide a solution space of tried and tested best practices for a specified problem within a given context. The solution space is non-prescriptive and is described in terms of relationships between roles that can be assigned to software components. Architectural patterns are used to specify high level designs of complete systems, as opposed to domain or tactical level patterns that address specific lower level problem areas. Pattern Languages comprise multiple software patterns combining in multiple possible sequences to form a language with the individual patterns forming the language vocabulary while the valid sequences through the patterns define the grammar. Pattern Languages provide flexible generalised solutions within a particular domain that can be customised to solve problems of differing characteristics and levels of iii complexity within the domain. The specification of one or more Pattern Languages tailored to the Mobile Augmented Reality domain can therefore provide a generalised guide for the design and architecture of Mobile Augmented Reality applications from an architectural level down to the ”nuts-and-bolts” implementation level. While there is a large body of research into the technical specialisations pertaining to Mobile Augmented Reality, there is a dearth of up-to-date literature covering Mobile Augmented Reality design. This thesis fills this vacuum by: 1. Providing architectural patterns that provide the spine on which the design of Mobile Augmented Reality artefacts can be based; 2. Documenting existing patterns within the context of Mobile Augmented Reality; 3. Identifying new patterns specific to Mobile Augmented Reality; and 4. Combining the patterns into Pattern Languages for Detection & Tracking, Rendering & Interaction and Data Access for Mobile Augmented Reality. The resulting Pattern Languages support design at multiple levels of complexity from an object-oriented framework down to specific one-off Augmented Reality applications. The practical contribution of this thesis is the specification of architectural patterns and Pattern Language that provide a unified design approach for both the overall architecture and the detailed design of Mobile Augmented Reality artefacts. The theoretical contribution is a design theory for Mobile Augmented Reality gleaned from the extraction of patterns and creation of a pattern language or languages

    Exploiting frame coherence in real-time rendering for energy-efficient GPUs

    Get PDF
    The computation capabilities of mobile GPUs have greatly evolved in the last generations, allowing real-time rendering of realistic scenes. However, the desire for processing complex environments clashes with the battery-operated nature of smartphones, for which users expect long operating times per charge and a low-enough temperature to comfortably hold them. Consequently, improving the energy-efficiency of mobile GPUs is paramount to fulfill both performance and low-power goals. The work of the processors from within the GPU and their accesses to off-chip memory are the main sources of energy consumption in graphics workloads. Yet most of this energy is spent in redundant computations, as the frame rate required to produce animations results in a sequence of extremely similar images. The goal of this thesis is to improve the energy-efficiency of mobile GPUs by designing micro-architectural mechanisms that leverage frame coherence in order to reduce the redundant computations and memory accesses inherent in graphics applications. First, we focus on reducing redundant color computations. Mobile GPUs typically employ an architecture called Tile-Based Rendering, in which the screen is divided into tiles that are independently rendered in on-chip buffers. It is common that more than 80% of the tiles produce exactly the same output between consecutive frames. We propose Rendering Elimination (RE), a mechanism that accurately determines such occurrences by computing and storing signatures of the inputs of all the tiles in a frame. If the signatures of a tile across consecutive frames are the same, the colors computed in the preceding frame are reused, saving all computations and memory accesses associated to the rendering of the tile. We show that RE vastly outperforms related schemes found in the literature, achieving a reduction of energy consumption of 37% and execution time of 33% with minimal overheads. Next, we focus on reducing redundant computations of fragments that will eventually not be visible. In real-time rendering, objects are processed in the order they are submitted to the GPU, which usually causes that the results of previously-computed objects are overwritten by new objects that turn occlude them. Consequently, whether or not a particular object will be occluded is not known until the entire scene has been processed. Based on the fact that visibility tends to remain constant across consecutive frames, we propose Early Visibility Resolution (EVR), a mechanism that predicts visibility based on information obtained in the preceding frame. EVR first computes and stores the depth of the farthest visible point after rendering each tile. Whenever a tile is rendered in the following frame, primitives that are farther from the observer than the stored depth are predicted to be occluded, and processed after the ones predicted to be visible. Additionally, this visibility prediction scheme is used to improve Rendering Elimination’s equal tile detection capabilities by not adding primitives predicted to be occluded in the signature. With minor hardware costs, EVR is shown to provide a reduction of energy consumption of 43% and execution time of 39%. Finally, we focus on reducing computations in tiles with low spatial frequencies. GPUs produce pixel colors by sampling triangles once per pixel and performing computations on each sampling location. However, most screen regions do not include sufficient detail to require high sampling rates, leading to a significant amount of energy wasted computing the same color for neighboring pixels. Given that spatial frequencies are maintained across frames, we propose Dynamic Sampling Rate, a mechanism that analyzes the spatial frequencies of tiles and determines the best sampling rate for them, which is applied in the following frame. Results show that Dynamic Sampling Rate significantly reduces processor activity, yielding energy savings of 40% and execution time reductions of 35%.La capacitat de càlcul de les GPU mòbils ha augmentat en gran mesura en les darreres generacions, permetent el renderitzat de paisatges complexos en temps real. Nogensmenys, el desig de processar escenes cada vegada més realistes xoca amb el fet que aquests dispositius funcionen amb bateries, i els usuaris n’esperen llargues durades i una temperatura prou baixa com per a ser agafats còmodament. En conseqüència, millorar l’eficiència energètica de les GPU mòbils és essencial per a aconseguir els objectius de rendiment i baix consum. Els processadors de la GPU i els seus accessos a memòria són els principals consumidors d’energia en càrregues gràfiques, però molt d’aquest consum és malbaratat en càlculs redundants, ja que les animacions produïdes s¿aconsegueixen renderitzant una seqüència d’imatges molt similars. L’objectiu d’aquesta tesi és millorar l’eficiència energètica de les GPU mòbils mitjançant el disseny de mecanismes microarquitectònics que aprofitin la coherència entre imatges per a reduir els càlculs i accessos redundants inherents a les aplicacions gràfiques. Primerament, ens centrem en reduir càlculs redundants de colors. A les GPU mòbils, sovint s'empra una arquitectura anomenada Tile-Based Rendering, en què la pantalla es divideix en regions que es processen independentment dins del xip. És habitual que més del 80% de les regions de pantalla produeixin els mateixos colors entre imatges consecutives. Proposem Rendering Elimination (RE), un mecanisme que determina acuradament aquests casos computant una signatura de les entrades de totes les regions. Si les signatures de dues imatges són iguals, es reutilitzen els colors calculats a la imatge anterior, el que estalvia tots els càlculs i accessos a memòria de la regió. RE supera àmpliament propostes relacionades de la literatura, aconseguint una reducció del consum energètic del 37% i del temps d’execució del 33%. Seguidament, ens centrem en reduir càlculs redundants en fragments que eventualment no seran visibles. En aplicacions gràfiques, els objectes es processen en l’ordre en què son enviats a la GPU, el que sovint causa que resultats ja processats siguin sobreescrits per nous objectes que els oclouen. Per tant, no se sap si un objecte serà visible o no fins que tota l’escena ha estat processada. Fonamentats en el fet que la visibilitat tendeix a ser constant entre imatges, proposem Early Visibility Resolution (EVR), un mecanisme que prediu la visibilitat basat en informació obtinguda a la imatge anterior. EVR computa i emmagatzema la profunditat del punt visible més llunyà després de processar cada regió de pantalla. Quan es processa una regió a la imatge següent, es prediu que les primitives més llunyanes a el punt guardat seran ocloses i es processen després de les que es prediuen que seran visibles. Addicionalment, aquest esquema de predicció s’empra en millorar la detecció de regions redundants de RE al no afegir les primitives que es prediu que seran ocloses a les signatures. Amb un cost de maquinari mínim, EVR aconsegueix una millora del consum energètic del 43% i del temps d’execució del 39%. Finalment, ens centrem a reduir càlculs en regions de pantalla amb poca freqüència espacial. Les GPU actuals produeixen colors mostrejant els triangles una vegada per cada píxel i fent càlculs a cada localització mostrejada. Però la majoria de regions no tenen suficient detall per a necessitar altes freqüències de mostreig, el que implica un malbaratament d’energia en el càlcul del mateix color en píxels adjacents. Com les freqüències tendeixen a mantenir-se en el temps, proposem Dynamic Sampling Rate (DSR)¸ un mecanisme que analitza les freqüències de les regions una vegada han estat renderitzades i en determina la menor freqüència de mostreig a la que es poden processar, que s’aplica a la següent imatge...Postprint (published version

    n-Dimensional Display Interface - A Replacement For The Venerable Framebuffer

    Get PDF
    The n-Dimensional Display Interface is the result of an exploration for a new abstraction for display interfaces. Modern and future display use cases are pushing the boundaries of what is possible with even the highest-speed data links connecting computing devices and displays. The n-Dimensional Display Interface was designed to meet a set of guiding principles. The result is an abstraction just above the framebuffer that provides a great deal of backward compatibility and scalability to deliver transmission savings for the most challenging use cases such as 8k displays, display walls, remote connected displays, and low-bandwidth mobile displays. The first two phases of research demonstrated the concept of backward compatibility and then extended the architecture for blending while leveraging video semantics to support full screen video. The third phase finalized the architecture and tackled larger, more complicated use of nDDI: very large display walls.Doctor of Philosoph
    corecore