Search CORE

486 research outputs found

Envisioning a Next Generation Extended Reality Conferencing System with Efficient Photorealistic Human Rendering

Author: Mortazavi Masood
Peng Liang
Shen Chuanyue
Song Xiyun
Yang Zhangsihao
Yu Heather
Zhang Letian
Publication venue
Publication date: 28/06/2023
Field of study

Meeting online is becoming the new normal. Creating an immersive experience for online meetings is a necessity towards more diverse and seamless environments. Efficient photorealistic rendering of human 3D dynamics is the core of immersive meetings. Current popular applications achieve real-time conferencing but fall short in delivering photorealistic human dynamics, either due to limited 2D space or the use of avatars that lack realistic interactions between participants. Recent advances in neural rendering, such as the Neural Radiance Field (NeRF), offer the potential for greater realism in metaverse meetings. However, the slow rendering speed of NeRF poses challenges for real-time conferencing. We envision a pipeline for a future extended reality metaverse conferencing system that leverages monocular video acquisition and free-viewpoint synthesis to enhance data and hardware efficiency. Towards an immersive conferencing experience, we explore an accelerated NeRF-based free-viewpoint synthesis algorithm for rendering photorealistic human dynamics more efficiently. We show that our algorithm achieves comparable rendering quality while performing training and inference 44.5% and 213% faster than state-of-the-art methods, respectively. Our exploration provides a design basis for constructing metaverse conferencing systems that can handle complex application scenarios, including dynamic scene relighting with customized themes and multi-user conferencing that harmonizes real-world people into an extended world.Comment: Accepted to CVPR 2023 ECV Worksho

arXiv.org e-Print Archive

Rmagine: 3D Range Sensor Simulation in Polygonal Maps via Raytracing for Embedded Hardware on Mobile Robots

Author: Hertzberg Joachim
Mock Alexander
Wiemann Thomas
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 27/09/2022
Field of study

Sensor simulation has emerged as a promising and powerful technique to find solutions to many real-world robotic tasks like localization and pose tracking.However, commonly used simulators have high hardware requirements and are therefore used mostly on high-end computers. In this paper, we present an approach to simulate range sensors directly on embedded hardware of mobile robots that use triangle meshes as environment map. This library called Rmagine allows a robot to simulate sensor data for arbitrary range sensors directly on board via raytracing. Since robots typically only have limited computational resources, the Rmagine aims at being flexible and lightweight, while scaling well even to large environment maps. It runs on several platforms like Laptops or embedded computing boards like Nvidia Jetson by putting an unified API over the specific proprietary libraries provided by the hardware manufacturers. This work is designed to support the future development of robotic applications depending on simulation of range data that could previously not be computed in reasonable time on mobile systems

arXiv.org e-Print Archive

Human Performance Modeling and Rendering via Neural Animated Mesh

Author: Dai Haizhao
Jiang Yuheng
Wang Liao
Wu Minye
Xu Lan
Yao Kaixin
Yu Jingyi
Zhang Jiakai
Zhang Yingliang
Zhao Fuqiang
Zhong Yuhui
Publication venue
Publication date: 17/09/2022
Field of study

We have recently seen tremendous progress in the neural advances for photo-real human modeling and rendering. However, it's still challenging to integrate them into an existing mesh-based pipeline for downstream applications. In this paper, we present a comprehensive neural approach for high-quality reconstruction, compression, and rendering of human performances from dense multi-view videos. Our core intuition is to bridge the traditional animated mesh workflow with a new class of highly efficient neural techniques. We first introduce a neural surface reconstructor for high-quality surface generation in minutes. It marries the implicit volumetric rendering of the truncated signed distance field (TSDF) with multi-resolution hash encoding. We further propose a hybrid neural tracker to generate animated meshes, which combines explicit non-rigid tracking with implicit dynamic deformation in a self-supervised framework. The former provides the coarse warping back into the canonical space, while the latter implicit one further predicts the displacements using the 4D hash encoding as in our reconstructor. Then, we discuss the rendering schemes using the obtained animated meshes, ranging from dynamic texturing to lumigraph rendering under various bandwidth settings. To strike an intricate balance between quality and bandwidth, we propose a hierarchical solution by first rendering 6 virtual views covering the performer and then conducting occlusion-aware neural texture blending. We demonstrate the efficacy of our approach in a variety of mesh-based applications and photo-realistic free-view experiences on various platforms, i.e., inserting virtual human performances into real environments through mobile AR or immersively watching talent shows with VR headsets.Comment: 18 pages, 17 figure

arXiv.org e-Print Archive

Neural Fields for Robotic Object Manipulation from a Single Image

Author: Birchfield Stan
Blukis Valts
Fox Dieter
Kweon In So
Lee Taeyeop
Tremblay Jonathan
Wen Bowen
Yoon Kuk-Jin
Publication venue
Publication date: 05/11/2022
Field of study

We present a unified and compact representation for object rendering, 3D reconstruction, and grasp pose prediction that can be inferred from a single image within a few seconds. We achieve this by leveraging recent advances in the Neural Radiance Field (NeRF) literature that learn category-level priors and fine-tune on novel objects with minimal data and time. Our insight is that we can learn a compact shape representation and extract meaningful additional information from it, such as grasping poses. We believe this to be the first work to retrieve grasping poses directly from a NeRF-based representation using a single viewpoint (RGB-only), rather than going through a secondary network and/or representation. When compared to prior art, our method is two to three orders of magnitude smaller while achieving comparable performance at view reconstruction and grasping. Accompanying our method, we also propose a new dataset of rendered shoes for training a sim-2-real NeRF method with grasping poses for different widths of grippers.Comment: Submitted to ICRA 202

arXiv.org e-Print Archive

Aria-NeRF: Multimodal Egocentric View Synthesis

Author: Qiu Jianing
Schwager Mac
Sun Jiankai
Tucker John
Yu Javier
Zheng Chuanyang
Publication venue
Publication date: 10/11/2023
Field of study

We seek to accelerate research in developing rich, multimodal scene models trained from egocentric data, based on differentiable volumetric ray-tracing inspired by Neural Radiance Fields (NeRFs). The construction of a NeRF-like model from an egocentric image sequence plays a pivotal role in understanding human behavior and holds diverse applications within the realms of VR/AR. Such egocentric NeRF-like models may be used as realistic simulations, contributing significantly to the advancement of intelligent agents capable of executing tasks in the real-world. The future of egocentric view synthesis may lead to novel environment representations going beyond today's NeRFs by augmenting visual data with multimodal sensors such as IMU for egomotion tracking, audio sensors to capture surface texture and human language context, and eye-gaze trackers to infer human attention patterns in the scene. To support and facilitate the development and evaluation of egocentric multimodal scene modeling, we present a comprehensive multimodal egocentric video dataset. This dataset offers a comprehensive collection of sensory data, featuring RGB images, eye-tracking camera footage, audio recordings from a microphone, atmospheric pressure readings from a barometer, positional coordinates from GPS, connectivity details from Wi-Fi and Bluetooth, and information from dual-frequency IMU datasets (1kHz and 800Hz) paired with a magnetometer. The dataset was collected with the Meta Aria Glasses wearable device platform. The diverse data modalities and the real-world context captured within this dataset serve as a robust foundation for furthering our understanding of human behavior and enabling more immersive and intelligent experiences in the realms of VR, AR, and robotics

arXiv.org e-Print Archive

4K4D: Real-Time 4D View Synthesis at 4K Resolution

Author: Bao Hujun
He Guangzhao
Lin Haotong
Peng Sida
Shen Yujun
Sun Jiaming
Xu Zhen
Zhou Xiaowei
Publication venue
Publication date: 28/10/2023
Field of study

This paper targets high-fidelity and real-time view synthesis of dynamic 3D scenes at 4K resolution. Recently, some methods on dynamic view synthesis have shown impressive rendering quality. However, their speed is still limited when rendering high-resolution images. To overcome this problem, we propose 4K4D, a 4D point cloud representation that supports hardware rasterization and enables unprecedented rendering speed. Our representation is built on a 4D feature grid so that the points are naturally regularized and can be robustly optimized. In addition, we design a novel hybrid appearance model that significantly boosts the rendering quality while preserving efficiency. Moreover, we develop a differentiable depth peeling algorithm to effectively learn the proposed model from RGB videos. Experiments show that our representation can be rendered at over 400 FPS on the DNA-Rendering dataset at 1080p resolution and 80 FPS on the ENeRF-Outdoor dataset at 4K resolution using an RTX 4090 GPU, which is 30x faster than previous methods and achieves the state-of-the-art rendering quality. Our project page is available at https://zju3dv.github.io/4k4d/.Comment: Project Page: https://zju3dv.github.io/4k4

arXiv.org e-Print Archive

VGOS: Voxel Grid Optimization for View Synthesis from Sparse Inputs

Author: Chen Jiafu
Ji Boyan
Li Guangyuan
Sun Jiakai
Xing Wei
Zhang Zhanjie
Zhao Lei
Publication venue
Publication date: 26/04/2023
Field of study

Neural Radiance Fields (NeRF) has shown great success in novel view synthesis due to its state-of-the-art quality and flexibility. However, NeRF requires dense input views (tens to hundreds) and a long training time (hours to days) for a single scene to generate high-fidelity images. Although using the voxel grids to represent the radiance field can significantly accelerate the optimization process, we observe that for sparse inputs, the voxel grids are more prone to overfitting to the training views and will have holes and floaters, which leads to artifacts. In this paper, we propose VGOS, an approach for fast (3-5 minutes) radiance field reconstruction from sparse inputs (3-10 views) to address these issues. To improve the performance of voxel-based radiance field in sparse input scenarios, we propose two methods: (a) We introduce an incremental voxel training strategy, which prevents overfitting by suppressing the optimization of peripheral voxels in the early stage of reconstruction. (b) We use several regularization techniques to smooth the voxels, which avoids degenerate solutions. Experiments demonstrate that VGOS achieves state-of-the-art performance for sparse inputs with super-fast convergence. Code will be available at https://github.com/SJoJoK/VGOS.Comment: IJCAI 2023 Accepted (Main Track

arXiv.org e-Print Archive

A graphics processing unit based method for dynamic real-time global illumination

Author: Yu I.
Publication venue: UCL (University College London)
Publication date: 28/03/2012
Field of study

Real-time realistic image synthesis for virtual environments has been one of the most actively researched areas in computer graphics for over a decade. Images that display physically correct illumination of an environment can be simulated by evaluating a multi-dimensional integral equation, called the rendering equation, over the surfaces of the environment. Many global illumination algorithms such as pathtracing, photon mapping and distributed ray-tracing can produce realistic images but are generally unable to cope with dynamic lighting and objects at interactive rates. It still remains one of most challenging problems to simulate physically correctly illuminated dynamic environments without a substantial preprocessing step. In this thesis we present a rendering system for dynamic environments by implementing a customized rasterizer for global illumination entirely on the graphics hardware, the Graphical Processing Unit. Our research focuses on a parameterization of discrete visibility field for efficient indirect illumination computation. In order to generate the visibility field, we propose a CUDA-based (Compute Unified Device Architecture) rasterizer which builds Layered Hit Buffers (LHB) by rasterizing polygons into multi-layered structural buffers in parallel. The LHB provides a fast visibility function for any direction at any point. We propose a cone approximation solution to resolve an aliasing problem due to limited directional discretization. We also demonstrate how to remove structure noises by adapting an interleaved sampling scheme and discontinuity buffer. We show that a gathering method amortized with a multi-level Quasi Mont Carlo method can evaluate the rendering equation in real-time. The method can realize real-time walk-through of a complex virtual environment that has a mixture of diffuse and glossy reflection, computing multiple indirect bounces on the fly. We show that our method is capable of simulating fully dynamic environments including changes of view, materials, lighting and objects at interactive rates on commodity level graphics hardware

UCL Discovery

3D Visualization using virtual view generation for stereoscopic hardware

Author: Torre Lara Jordi de la
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/02/2010
Field of study

To summarize, the final purpose is to create a compact tool that generates virtual stereo views at a desired position, being able to move and rotate it, from the information of several cameras spread in the area. These views will allow to visualize the scene in 3D, using a device that reproduces the stereoscopic effect, from the current point of vie