1,438 research outputs found
Perception-driven approaches to real-time remote immersive visualization
In remote immersive visualization systems, real-time 3D perception through RGB-D cameras, combined with modern Virtual Reality (VR) interfaces, enhances the user’s sense of presence in a remote scene through 3D reconstruction rendered in a remote immersive visualization system. Particularly, in situations when there is a need to visualize, explore and perform tasks in inaccessible environments, too hazardous or distant. However, a remote visualization system requires the entire pipeline from 3D data acquisition to VR rendering satisfies the speed, throughput, and high visual realism. Mainly when using point-cloud, there is a fundamental quality difference between the acquired data of the physical world and the displayed data because of network latency and throughput limitations that negatively impact the sense of presence and provoke cybersickness. This thesis presents state-of-the-art research to address these problems by taking the human visual system as inspiration, from sensor data acquisition to VR rendering. The human visual system does not have a uniform vision across the field of view; It has the sharpest visual acuity at the center of the field of view. The acuity falls off towards the periphery. The peripheral vision provides lower resolution to guide the eye movements so that the central vision visits all the interesting crucial parts. As a first contribution, the thesis developed remote visualization strategies that utilize the acuity fall-off to facilitate the processing, transmission, buffering, and rendering in VR of 3D reconstructed scenes while simultaneously reducing throughput requirements and latency. As a second contribution, the thesis looked into attentional mechanisms to select and draw user engagement to specific information from the dynamic spatio-temporal environment. It proposed a strategy to analyze the remote scene concerning the 3D structure of the scene, its layout, and the spatial, functional, and semantic relationships between objects in the scene. The strategy primarily focuses on analyzing the scene with models the human visual perception uses. It sets a more significant proportion of computational resources on objects of interest and creates a more realistic visualization. As a supplementary contribution, A new volumetric point-cloud density-based Peak Signal-to-Noise Ratio (PSNR) metric is proposed to evaluate the introduced techniques. An in-depth evaluation of the presented systems, comparative examination of the proposed point cloud metric, user studies, and experiments demonstrated that the methods introduced in this thesis are visually superior while significantly reducing latency and throughput
Enhancing Perception and Immersion in Pre-Captured Environments through Learning-Based Eye Height Adaptation
Pre-captured immersive environments using omnidirectional cameras provide a
wide range of virtual reality applications. Previous research has shown that
manipulating the eye height in egocentric virtual environments can
significantly affect distance perception and immersion. However, the influence
of eye height in pre-captured real environments has received less attention due
to the difficulty of altering the perspective after finishing the capture
process. To explore this influence, we first propose a pilot study that
captures real environments with multiple eye heights and asks participants to
judge the egocentric distances and immersion. If a significant influence is
confirmed, an effective image-based approach to adapt pre-captured real-world
environments to the user's eye height would be desirable. Motivated by the
study, we propose a learning-based approach for synthesizing novel views for
omnidirectional images with altered eye heights. This approach employs a
multitask architecture that learns depth and semantic segmentation in two
formats, and generates high-quality depth and semantic segmentation to
facilitate the inpainting stage. With the improved omnidirectional-aware
layered depth image, our approach synthesizes natural and realistic visuals for
eye height adaptation. Quantitative and qualitative evaluation shows favorable
results against state-of-the-art methods, and an extensive user study verifies
improved perception and immersion for pre-captured real-world environments.Comment: 10 pages, 13 figures, 3 tables, submitted to ISMAR 202
Robotic Cameraman for Augmented Reality based Broadcast and Demonstration
In recent years, a number of large enterprises have gradually begun to use vari-ous Augmented Reality technologies to prominently improve the audiences’ view oftheir products. Among them, the creation of an immersive virtual interactive scenethrough the projection has received extensive attention, and this technique refers toprojection SAR, which is short for projection spatial augmented reality. However,as the existing projection-SAR systems have immobility and limited working range,they have a huge difficulty to be accepted and used in human daily life. Therefore,this thesis research has proposed a technically feasible optimization scheme so thatit can be practically applied to AR broadcasting and demonstrations.
Based on three main techniques required by state-of-art projection SAR applica-tions, this thesis has created a novel mobile projection SAR cameraman for ARbroadcasting and demonstration. Firstly, by combining the CNN scene parsingmodel and multiple contour extractors, the proposed contour extraction pipelinecan always detect the optimal contour information in non-HD or blurred images.This algorithm reduces the dependency on high quality visual sensors and solves theproblems of low contour extraction accuracy in motion blurred images. Secondly, aplane-based visual mapping algorithm is introduced to solve the difficulties of visualmapping in these low-texture scenarios. Finally, a complete process of designing theprojection SAR cameraman robot is introduced. This part has solved three mainproblems in mobile projection-SAR applications: (i) a new method for marking con-tour on projection model is proposed to replace the model rendering process. Bycombining contour features and geometric features, users can identify objects oncolourless model easily. (ii) a camera initial pose estimation method is developedbased on visual tracking algorithms, which can register the start pose of robot to thewhole scene in Unity3D. (iii) a novel data transmission approach is introduced to establishes a link between external robot and the robot in Unity3D simulation work-space. This makes the robotic cameraman can simulate its trajectory in Unity3D simulation work-space and project correct virtual content.
Our proposed mobile projection SAR system has made outstanding contributionsto the academic value and practicality of the existing projection SAR technique. Itfirstly solves the problem of limited working range. When the system is running ina large indoor scene, it can follow the user and project dynamic interactive virtualcontent automatically instead of increasing the number of visual sensors. Then,it creates a more immersive experience for audience since it supports the user hasmore body gestures and richer virtual-real interactive plays. Lastly, a mobile systemdoes not require up-front frameworks and cheaper and has provided the public aninnovative choice for indoor broadcasting and exhibitions
DeepMetricEye: Metric Depth Estimation in Periocular VR Imagery
Despite the enhanced realism and immersion provided by VR headsets, users
frequently encounter adverse effects such as digital eye strain (DES), dry eye,
and potential long-term visual impairment due to excessive eye stimulation from
VR displays and pressure from the mask. Recent VR headsets are increasingly
equipped with eye-oriented monocular cameras to segment ocular feature maps.
Yet, to compute the incident light stimulus and observe periocular condition
alterations, it is imperative to transform these relative measurements into
metric dimensions. To bridge this gap, we propose a lightweight framework
derived from the U-Net 3+ deep learning backbone that we re-optimised, to
estimate measurable periocular depth maps. Compatible with any VR headset
equipped with an eye-oriented monocular camera, our method reconstructs
three-dimensional periocular regions, providing a metric basis for related
light stimulus calculation protocols and medical guidelines. Navigating the
complexities of data collection, we introduce a Dynamic Periocular Data
Generation (DPDG) environment based on UE MetaHuman, which synthesises
thousands of training images from a small quantity of human facial scan data.
Evaluated on a sample of 36 participants, our method exhibited notable efficacy
in the periocular global precision evaluation experiment, and the pupil
diameter measurement
Distributed Implementation of eXtended Reality Technologies over 5G Networks
MenciĂłn Internacional en el tĂtulo de doctorThe revolution of Extended Reality (XR) has already started and is rapidly
expanding as technology advances. Announcements such as Meta’s Metaverse have
boosted the general interest in XR technologies, producing novel use cases. With
the advent of the fifth generation of cellular networks (5G), XR technologies are
expected to improve significantly by offloading heavy computational processes from
the XR Head Mounted Display (HMD) to an edge server. XR offloading can rapidly
boost XR technologies by considerably reducing the burden on the XR hardware,
while improving the overall user experience by enabling smoother graphics and more
realistic interactions. Overall, the combination of XR and 5G has the potential to
revolutionize the way we interact with technology and experience the world around
us.
However, XR offloading is a complex task that requires state-of-the-art tools
and solutions, as well as an advanced wireless network that can meet the demanding
throughput, latency, and reliability requirements of XR. The definition of these
requirements strongly depends on the use case and particular XR offloading implementations.
Therefore, it is crucial to perform a thorough Key Performance
Indicators (KPIs) analysis to ensure a successful design of any XR offloading solution.
Additionally, distributed XR implementations can be intrincated systems with
multiple processes running on different devices or virtual instances. All these agents
must be well-handled and synchronized to achieve XR real-time requirements and
ensure the expected user experience, guaranteeing a low processing overhead. XR
offloading requires a carefully designed architecture which complies with the required
KPIs while efficiently synchronizing and handling multiple heterogeneous devices.
Offloading XR has become an essential use case for 5G and beyond 5G technologies.
However, testing distributed XR implementations requires access to advanced
5G deployments that are often unavailable to most XR application developers. Conversely,
the development of 5G technologies requires constant feedback from potential
applications and use cases. Unfortunately, most 5G providers, engineers, or
researchers lack access to cutting-edge XR hardware or applications, which can hinder
the fast implementation and improvement of 5G’s most advanced features. Both
technology fields require ongoing input and continuous development from each other
to fully realize their potential. As a result, XR and 5G researchers and developers
must have access to the necessary tools and knowledge to ensure the rapid and
satisfactory development of both technology fields.
In this thesis, we focus on these challenges providing knowledge, tools and solutiond towards the implementation of advanced offloading technologies, opening the
door to more immersive, comfortable and accessible XR technologies. Our contributions
to the field of XR offloading include a detailed study and description of the
necessary network throughput and latency KPIs for XR offloading, an architecture
for low latency XR offloading and our full end to end XR offloading implementation
ready for a commercial XR HMD. Besides, we also present a set of tools which can
facilitate the joint development of 5G networks and XR offloading technologies: our
5G RAN real-time emulator and a multi-scenario XR IP traffic dataset.
Firstly, in this thesis, we thoroughly examine and explain the KPIs that are
required to achieve the expected Quality of Experience (QoE) and enhanced immersiveness
in XR offloading solutions. Our analysis focuses on individual XR
algorithms, rather than potential use cases. Additionally, we provide an initial
description of feasible 5G deployments that could fulfill some of the proposed KPIs
for different offloading scenarios.
We also present our low latency muti-modal XR offloading architecture, which
has already been tested on a commercial XR device and advanced 5G deployments,
such as millimeter-wave (mmW) technologies. Besides, we describe our full endto-
end complex XR offloading system which relies on our offloading architecture to
provide low latency communication between a commercial XR device and a server
running a Machine Learning (ML) algorithm. To the best of our knowledge, this is
one of the first successful XR offloading implementations for complex ML algorithms
in a commercial device.
With the goal of providing XR developers and researchers access to complex
5G deployments and accelerating the development of future XR technologies, we
present FikoRE, our 5G RAN real-time emulator. FikoRE has been specifically
designed not only to model the network with sufficient accuracy but also to support
the emulation of a massive number of users and actual IP throughput. As FikoRE
can handle actual IP traffic above 1 Gbps, it can directly be used to test distributed
XR solutions. As we describe in the thesis, its emulation capabilities make FikoRE
a potential candidate to become a reference testbed for distributed XR developers
and researchers.
Finally, we used our XR offloading tools to generate an XR IP traffic dataset
which can accelerate the development of 5G technologies by providing a straightforward
manner for testing novel 5G solutions using realistic XR data. This dataset is
generated for two relevant XR offloading scenarios: split rendering, in which the rendering
step is moved to an edge server, and heavy ML algorithm offloading. Besides,
we derive the corresponding IP traffic models from the captured data, which can be
used to generate realistic XR IP traffic. We also present the validation experiments
performed on the derived models and their results.This work has received funding from the European Union (EU) Horizon 2020 research and innovation programme under the Marie SkĹ‚odowska-Curie ETN TeamUp5G, grant agreement No. 813391.Programa de Doctorado en Multimedia y Comunicaciones por la Universidad Carlos III de Madrid y la Universidad Rey Juan CarlosPresidente: Narciso GarcĂa Santos.- Secretario: Fernando DĂaz de MarĂa.- Vocal: Aryan Kaushi
Model-Based Environmental Visual Perception for Humanoid Robots
The visual perception of a robot should answer two fundamental questions: What? and Where? In order to properly and efficiently reply to these questions, it is essential to establish a bidirectional coupling between the external stimuli and the internal representations. This coupling links the physical world with the inner abstraction models by sensor transformation, recognition, matching and optimization algorithms. The objective of this PhD is to establish this sensor-model coupling
- …