22 research outputs found
Foveated Path Tracing with Fast Reconstruction and Efficient Sample Distribution
Polunseuranta on tietokonegrafiikan piirtotekniikka, jota on käytetty pääasiassa ei-reaaliaikaisen realistisen piirron tekemiseen. Polunseuranta tukee luonnostaan monia muilla tekniikoilla vaikeasti saavutettavia todellisen valon ilmiöitä kuten heijastuksia ja taittumista. Reaaliaikainen polunseuranta on hankalaa polunseurannan suuren laskentavaatimuksen takia. Siksi nykyiset reaaliaikaiset polunseurantasysteemi tuottavat erittäin kohinaisia kuvia, jotka tyypillisesti suodatetaan jälkikäsittelykohinanpoisto-suodattimilla.
Erittäin immersiivisiä käyttäjäkokemuksia voitaisiin luoda polunseurannalla, joka täyttäisi laajennetun todellisuuden vaatimukset suuresta resoluutiosta riittävän matalassa vasteajassa. Yksi mahdollinen ratkaisu näiden vaatimusten täyttämiseen voisi olla katsekeskeinen polunseuranta, jossa piirron resoluutiota vähennetään katseen reunoilla. Tämän johdosta piirron laatu on katseen reunoilla sekä harvaa että kohinaista, mikä asettaa suuren roolin lopullisen kuvan koostavalle suodattimelle.
Tässä työssä esitellään ensimmäinen reaaliajassa toimiva regressionsuodatin. Suodatin on suunniteltu kohinaisille kuville, joissa on yksi polunseurantanäyte pikseliä kohden. Nopea suoritus saavutetaan tiileissä käsittelemällä ja nopealla sovituksen toteutuksella. Lisäksi työssä esitellään Visual-Polar koordinaattiavaruus, joka jakaa polunseurantanäytteet siten, että niiden jakauma seuraa silmän herkkyysmallia. Visual-Polar-avaruuden etu muihin tekniikoiden nähden on että se vähentää työmäärää sekä polunseurannassa että suotimessa. Nämä tekniikat esittelevät toimivan prototyypin katsekeskeisestä polunseurannasta, ja saattavat toimia tienraivaajina laajamittaiselle realistisen reaaliaikaisen polunseurannan käyttöönotolle.Photo-realistic offline rendering is currently done with path tracing, because it naturally produces many real-life light effects such as reflections, refractions and caustics. These effects are hard to achieve with other rendering techniques. However, path tracing in real time is complicated due to its high computational demand. Therefore, current real-time path tracing systems can only generate very noisy estimate of the final frame, which is then denoised with a post-processing reconstruction filter.
A path tracing-based rendering system capable of filling the high resolution in the low latency requirements of mixed reality devices would generate a very immersive user experience. One possible solution for fulfilling these requirements could be foveated path tracing, wherein the rendering resolution is reduced in the periphery of the human visual system. The key challenge is that the foveated path tracing in the periphery is both sparse and noisy, placing high demands on the reconstruction filter.
This thesis proposes the first regression-based reconstruction filter for path tracing that runs in real time. The filter is designed for highly noisy one sample per pixel inputs. The fast execution is accomplished with blockwise processing and fast implementation of the regression. In addition, a novel Visual-Polar coordinate space which distributes the samples according to the contrast sensitivity model of the human visual system is proposed. The specialty of Visual-Polar space is that it reduces both path tracing and reconstruction work because both of them can be done with smaller resolution. These techniques enable a working prototype of a foveated path tracing system and may work as a stepping stone towards wider commercial adoption of photo-realistic real-time path tracing
A space-variant visual pathway model for data efficient deep learning
We present an investigation into adopting a model of the retino-cortical mapping, found in biological visual systems, to improve the efficiency of image analysis using Deep Convolutional Neural Nets (DCNNs) in the context of robot vision and egocentric perception systems. This work has now enabled DCNNs to process input images approaching one million pixels in size, in real time, using only consumer grade graphics processor (GPU) hardware in a single pass of the DCNN
Enhancing Visual and Gestural Fidelity for Effective Virtual Environments
A challenge for the virtual reality (VR) industry is facing is that VR is not immersive enough to make people feel a genuine sense of presence: the low frame rate leads to dizziness and the lack of human body visualization limits the human-computer interaction. In this dissertation, I present our research on enhancing visual and gestural fidelity in the virtual environment.
First, I present a new foveated rendering technique: Kernel Foveated Rendering (KFR), which parameterizes foveated rendering by embedding polynomial kernel functions in log-polar space. This GPU-driven technique uses parameterized foveation that mimics the distribution of photoreceptors in the human retina. I present a two-pass kernel foveated rendering pipeline that maps well onto modern GPUs. I have carried out user studies to empirically identify the KFR parameters and have observed a 2.8x-3.2x speedup in rendering on 4K displays.
Second, I explore the rendering acceleration through foveation for 4D light fields, which captures both the spatial and angular rays, thus enabling free-viewpoint rendering and custom selection of the focal plane. I optimize the KFR algorithm by adjusting the weight of each slice in the light field, so that it automatically selects the optimal foveation parameters for different images according to the gaze position. I have validated our approach on the rendering of light fields by carrying out both quantitative experiments and user studies. Our method achieves speedups of 3.47x-7.28x for different levels of foveation and different rendering resolutions.
Thirdly, I present a simple yet effective technique for further reducing the cost of foveated rendering by leveraging ocular dominance - the tendency of the human visual system to prefer scene perception from one eye over the other. Our new approach, eye-dominance-guided foveated rendering (EFR), renders the scene at a lower foveation level (with higher detail) for the dominant eye than the non-dominant eye. Compared with traditional foveated rendering, EFR can be expected to provide superior rendering performance while preserving the same level of perceived visual quality.
Finally, I present an approach to use an end-to-end convolutional neural network, which consists of a concatenation of an encoder and a decoder, to reconstruct a 3D model of a human hand from a single RGB image. Previous research work on hand mesh reconstruction suffers from the lack of training data. To train networks with full supervision, we fit a parametric hand model to 3D annotations, and we train the networks with the RGB image with the fitted parametric model as the supervision. Our approach leads to significantly improved quality compared to state-of-the-art hand mesh reconstruction techniques
Efficient image-based rendering
Recent advancements in real-time ray tracing and deep learning have significantly enhanced the realism of computer-generated images. However, conventional 3D computer graphics (CG) can still be time-consuming and resource-intensive, particularly when creating photo-realistic simulations of complex or animated scenes. Image-based rendering (IBR) has emerged as an alternative approach that utilizes pre-captured images from the real world to generate realistic images in real-time, eliminating the need for extensive modeling. Although IBR has its advantages, it faces challenges in providing the same level of control over scene attributes as traditional CG pipelines and accurately reproducing complex scenes and objects with different materials, such as transparent objects. This thesis endeavors to address these issues by harnessing the power of deep learning and incorporating the fundamental principles of graphics and physical-based rendering. It offers an efficient solution that enables interactive manipulation of real-world dynamic scenes captured from sparse views, lighting positions, and times, as well as a physically-based approach that facilitates accurate reproduction of the view dependency effect resulting from the interaction between transparent objects and their surrounding environment. Additionally, this thesis develops a visibility metric that can identify artifacts in the reconstructed IBR images without observing the reference image, thereby contributing to the design of an effective IBR acquisition pipeline. Lastly, a perception-driven rendering technique is developed to provide high-fidelity visual content in virtual reality displays while retaining computational efficiency.Jüngste Fortschritte im Bereich Echtzeit-Raytracing und Deep Learning haben den Realismus computergenerierter Bilder erheblich verbessert. Konventionelle 3DComputergrafik (CG) kann jedoch nach wie vor zeit- und ressourcenintensiv sein, insbesondere bei der Erstellung fotorealistischer Simulationen von komplexen oder animierten Szenen. Das bildbasierte Rendering (IBR) hat sich als alternativer Ansatz herauskristallisiert, bei dem vorab aufgenommene Bilder aus der realen Welt verwendet werden, um realistische Bilder in Echtzeit zu erzeugen, so dass keine umfangreiche Modellierung erforderlich ist. Obwohl IBR seine Vorteile hat, ist es eine Herausforderung, das gleiche Maß an Kontrolle über Szenenattribute zu bieten wie traditionelle CG-Pipelines und komplexe Szenen und Objekte mit unterschiedlichen Materialien, wie z.B. transparente Objekte, akkurat wiederzugeben. In dieser Arbeit wird versucht, diese Probleme zu lösen, indem die Möglichkeiten des Deep Learning genutzt und die grundlegenden Prinzipien der Grafik und des physikalisch basierten Renderings einbezogen werden. Sie bietet eine effiziente Lösung, die eine interaktive Manipulation von dynamischen Szenen aus der realen Welt ermöglicht, die aus spärlichen Ansichten, Beleuchtungspositionen und Zeiten erfasst wurden, sowie einen physikalisch basierten Ansatz, der eine genaue Reproduktion des Effekts der Sichtabhängigkeit ermöglicht, der sich aus der Interaktion zwischen transparenten Objekten und ihrer Umgebung ergibt. Darüber hinaus wird in dieser Arbeit eine Sichtbarkeitsmetrik entwickelt, mit der Artefakte in den rekonstruierten IBR-Bildern identifiziert werden können, ohne das Referenzbild zu betrachten, und die somit zur Entwicklung einer effektiven IBR-Erfassungspipeline beiträgt. Schließlich wird ein wahrnehmungsgesteuertes Rendering-Verfahren entwickelt, um visuelle Inhalte in Virtual-Reality-Displays mit hoherWiedergabetreue zu liefern und gleichzeitig die Rechenleistung zu erhalten
Blickpunktabhängige Computergraphik
Contemporary digital displays feature multi-million pixels at ever-increasing refresh rates. Reality, on the other hand, provides us with a view of the world that is continuous in space and time. The discrepancy between viewing the physical world and its sampled depiction on digital displays gives rise to perceptual quality degradations. By measuring or estimating where we look, gaze-contingent algorithms aim at exploiting the way we visually perceive to remedy visible artifacts. This dissertation presents a variety of novel gaze-contingent algorithms and respective perceptual studies. Chapter 4 and 5 present methods to boost perceived visual quality of conventional video footage when viewed on commodity monitors or projectors. In Chapter 6 a novel head-mounted display with real-time gaze tracking is described. The device enables a large variety of applications in the context of Virtual Reality and Augmented Reality. Using the gaze-tracking VR headset, a novel gaze-contingent render method is described in Chapter 7. The gaze-aware approach greatly reduces computational efforts for shading virtual worlds. The described methods and studies show that gaze-contingent algorithms are able to improve the quality of displayed images and videos or reduce the computational effort for image generation, while display quality perceived by the user does not change.Moderne digitale Bildschirme ermöglichen immer höhere Auflösungen bei ebenfalls steigenden Bildwiederholraten. Die Realität hingegen ist in Raum und Zeit kontinuierlich. Diese Grundverschiedenheit führt beim Betrachter zu perzeptuellen Unterschieden. Die Verfolgung der Aug-Blickrichtung ermöglicht blickpunktabhängige Darstellungsmethoden, die sichtbare Artefakte verhindern können. Diese Dissertation trägt zu vier Bereichen blickpunktabhängiger und wahrnehmungstreuer Darstellungsmethoden bei. Die Verfahren in Kapitel 4 und 5 haben zum Ziel, die wahrgenommene visuelle Qualität von Videos für den Betrachter zu erhöhen, wobei die Videos auf gewöhnlicher Ausgabehardware wie z.B. einem Fernseher oder Projektor dargestellt werden. Kapitel 6 beschreibt die Entwicklung eines neuartigen Head-mounted Displays mit Unterstützung zur Erfassung der Blickrichtung in Echtzeit. Die Kombination der Funktionen ermöglicht eine Reihe interessanter Anwendungen in Bezug auf Virtuelle Realität (VR) und Erweiterte Realität (AR). Das vierte und abschließende Verfahren in Kapitel 7 dieser Dissertation beschreibt einen neuen Algorithmus, der das entwickelte Eye-Tracking Head-mounted Display zum blickpunktabhängigen Rendern nutzt. Die Qualität des Shadings wird hierbei auf Basis eines Wahrnehmungsmodells für jeden Bildpixel in Echtzeit analysiert und angepasst. Das Verfahren hat das Potenzial den Berechnungsaufwand für das Shading einer virtuellen Szene auf ein Bruchteil zu reduzieren. Die in dieser Dissertation beschriebenen Verfahren und Untersuchungen zeigen, dass blickpunktabhängige Algorithmen die Darstellungsqualität von Bildern und Videos wirksam verbessern können, beziehungsweise sich bei gleichbleibender Bildqualität der Berechnungsaufwand des bildgebenden Verfahrens erheblich verringern lässt
How can Extended Reality Help Individuals with Depth Misperception?
Despite the recent actual uses of Extended Reality (XR) in treatment of patients, some areas are less explored. One gap in research is how XR can improve depth perception for patients. Accordingly, the depth perception process in XR settings and in human vision are explored and trackers, visual sensors, and displays as assistive tools of XR settings are scrutinized to extract their potentials in influencing users’ depth perception experience. Depth perception enhancement is relying not only on depth perception algorithms, but also on visualization algorithms, display new technologies, computation power enhancements, and vision apparatus neural mechanism knowledge advancements. Finally, it is discussed that XR holds assistive features not only for the improvement of vision impairments but also for the diagnosis part. Although, each specific patient requires a specific set of XR setting due to different neural or cognition reactions in different individuals with same the disease
The development of optical projection tomography instrumentation and its application to in vivo three dimensional imaging of zebrafish
OPT is a three dimensional (3D) imaging technique that can produce 3D reconstructions of
transparent samples, requiring only a widefield imaging system and sample rotation. OPT can
be readily applied to chemically cleared samples, or to live transparent organisms such as nematodes
or zebrafish. For preclinical imaging, there is a trade-off between optical accessibility and
biological relevance to humans. Adult Danio rerio (zebrafish) represent a promising compromise,
with greater homology to humans than smaller animals, and superior optical accessibility
than mice. However, their size and physiology present a number of imaging challenges including
non-negligible absorption and optical scattering, and limited time for image data acquisition if
the fish are to be recovered for longitudinal studies. A key goal of this PhD thesis research was
to develop OPT to address these challenges and improve in vivo imaging capabilities for this
model organism.
This thesis builds on previous work at Imperial where angularly multiplexed OPT using
compressed sensing was developed and applied to in vivo imaging of a cancer-burdened adult
zebrafish, with a sufficiently short OPT data acquisition time to allow recovery of the fish after
anaesthesia. The previous cross-sectional study of this work was extended to a longitudinal
study of cancer progression that I undertook. The volume and quality of data acquired in
the longitudinal study presented a number of data processing challenges, which I addressed
with improved automation of the data processing pipeline and with the demonstration that
convolutional neural networks (CNN) could replace the iterative compressed sensing algorithm
previously used to suppress artifacts when reconstructing undersampled OPT data sets.
To address the issue of high attenuation through the centre of an adult zebrafish, I developed
conformal-high-dynamic-range (C-HDR) OPT and demonstrated that it could provide sufficient
dynamic range for brightfield imaging of such optically thick samples, noting that transmitted
light images can provide anatomical context for fluorescence image data.
To reduce the impact of optical scattering in OPT, I developed a parallelised quasi-confocal
version of OPT called slice-illuminated OPT (slice-OPT) to reject scattered photons and demonstrated
this with live zebrafish. To enable 3D imaging with short wave infrared (SWIR) light,
without the requirement of an expensive Ge or InGaAs camera, I implemented a single pixel
camera and demonstrated single-pixel OPT (SP-OPT) for the first time.Open Acces
Towards Energy Efficient Mobile Eye Tracking for AR Glasses through Optical Sensor Technology
After the introduction of smartphones and smartwatches, Augmented Reality (AR) glasses
are considered the next breakthrough in the field of wearables. While the transition from
smartphones to smartwatches was based mainly on established display technologies, the display
technology of AR glasses presents a technological challenge. Many display technologies,
such as retina projectors, are based on continuous adaptive control of the display based on
the user’s pupil position. Furthermore, head-mounted systems require an adaptation and
extension of established interaction concepts to provide the user with an immersive experience.
Eye-tracking is a crucial technology to help AR glasses achieve a breakthrough through
optimized display technology and gaze-based interaction concepts. Available eye-tracking
technologies, such as Video Oculography (VOG), do not meet the requirements of AR glasses,
especially regarding power consumption, robustness, and integrability. To further overcome
these limitations and push mobile eye-tracking for AR glasses forward, novel laser-based
eye-tracking sensor technologies are researched in this thesis. The thesis contributes to a significant
scientific advancement towards energy-efficientmobile eye-tracking for AR glasses.
In the first part of the thesis, novel scanned laser eye-tracking sensor technologies for AR
glasses with retina projectors as display technology are researched. The goal is to solve the
disadvantages of VOG systems and to enable robust eye-tracking and efficient ambient light
and slippage through optimized sensing methods and algorithms.
The second part of the thesis researches the use of static Laser Feedback Interferometry (LFI)
sensors as low power always-on sensor modality for detection of user interaction by gaze
gestures and context recognition through Human Activity Recognition (HAR) for AR glasses.
The static LFI sensors can measure the distance to the eye and the eye’s surface velocity with
an outstanding sampling rate. Furthermore, they offer high integrability regardless of the
display technology.
In the third part of the thesis, a model-based eye-tracking approach is researched based on
the static LFI sensor technology. The approach leads to eye-tracking with an extremely high
sampling rate by fusing multiple LFI sensors, which enables methods for display resolution
enhancement such as foveated rendering for AR glasses and Virtual Reality (VR) systems.
The scientific contributions of this work lead to a significant advance in the field of mobile
eye-tracking for AR glasses through the introduction of novel sensor technologies that enable
robust eye tracking in uncontrolled environments in particular. Furthermore, the scientific
contributions of this work have been published in internationally renowned journals and
conferences
Recommended from our members
Perceptual models for high-refresh-rate rendering
Rendering realistic images requires substantial computational power. With new high-refresh-rate displays as well as the renaissance of virtual reality (VR) and augmented reality (AR), one cannot expect that GPU performance will scale fast enough to meet the requirements of immersive photo-realistic rendering with current rendering techniques.
In this dissertation, I follow the dual of the well-known computer vision approach: vision is inverse graphics: to improve graphical algorithms, I consider the operation of the human visual system. I propose to model and exploit the limitations of the visual system in the context of novel high-refresh-rate displays; specifically, I focus on spatio-temporal perception, a topic that has received remarkably less attention than spatial-only perception so far.
I present three main contributions. First, I demonstrate the validity of the perceptual approach by presenting a conceptually simple rendering technique motivated by our eyes' limited sensitivity to high spatio-temporal change which reduces the rendering load and transmission requirement of current-generation VR headsets without introducing perceivable visual artefacts. Second, I present two visual models related to motion perception: (a) a metric for detecting flicker; and (b) a comprehensive visual model to predict perceived motion quality on monitors with arbitrary refresh rates and monitor resolutions. Third, I propose an adaptive rendering algorithm that utilises the proposed models. All algorithms operate on physical colorimetric units (instead of display-referenced pixel values), for which I provide the appropriate display measurements and models. All proposed algorithms and visual models are calibrated and validated with psychophysical experiments
Video Caching, Analytics and Delivery at the Wireless Edge: A Survey and Future Directions
Future wireless networks will provide high bandwidth, low-latency, and ultra-reliable Internet connectivity to meet the requirements of different applications, ranging from mobile broadband to the Internet of Things. To this aim, mobile edge caching, computing, and communication (edge-C3) have emerged to bring network resources (i.e., bandwidth, storage, and computing) closer to end users. Edge-C3 allows improving the network resource utilization as well as the quality of experience (QoE) of end users. Recently, several video-oriented mobile applications (e.g., live content sharing, gaming, and augmented reality) have leveraged edge-C3 in diverse scenarios involving video streaming in both the downlink and the uplink. Hence, a large number of recent works have studied the implications of video analysis and streaming through edge-C3. This article presents an in-depth survey on video edge-C3 challenges and state-of-the-art solutions in next-generation wireless and mobile networks. Specifically, it includes: a tutorial on video streaming in mobile networks (e.g., video encoding and adaptive bitrate streaming); an overview of mobile network architectures, enabling technologies, and applications for video edge-C3; video edge computing and analytics in uplink scenarios (e.g., architectures, analytics, and applications); and video edge caching, computing and communication methods in downlink scenarios (e.g., collaborative, popularity-based, and context-aware). A new taxonomy for video edge-C3 is proposed and the major contributions of recent studies are first highlighted and then systematically compared. Finally, several open problems and key challenges for future research are outlined