486 research outputs found
Envisioning a Next Generation Extended Reality Conferencing System with Efficient Photorealistic Human Rendering
Meeting online is becoming the new normal. Creating an immersive experience
for online meetings is a necessity towards more diverse and seamless
environments. Efficient photorealistic rendering of human 3D dynamics is the
core of immersive meetings. Current popular applications achieve real-time
conferencing but fall short in delivering photorealistic human dynamics, either
due to limited 2D space or the use of avatars that lack realistic interactions
between participants. Recent advances in neural rendering, such as the Neural
Radiance Field (NeRF), offer the potential for greater realism in metaverse
meetings. However, the slow rendering speed of NeRF poses challenges for
real-time conferencing. We envision a pipeline for a future extended reality
metaverse conferencing system that leverages monocular video acquisition and
free-viewpoint synthesis to enhance data and hardware efficiency. Towards an
immersive conferencing experience, we explore an accelerated NeRF-based
free-viewpoint synthesis algorithm for rendering photorealistic human dynamics
more efficiently. We show that our algorithm achieves comparable rendering
quality while performing training and inference 44.5% and 213% faster than
state-of-the-art methods, respectively. Our exploration provides a design basis
for constructing metaverse conferencing systems that can handle complex
application scenarios, including dynamic scene relighting with customized
themes and multi-user conferencing that harmonizes real-world people into an
extended world.Comment: Accepted to CVPR 2023 ECV Worksho
Rmagine: 3D Range Sensor Simulation in Polygonal Maps via Raytracing for Embedded Hardware on Mobile Robots
Sensor simulation has emerged as a promising and powerful technique to find
solutions to many real-world robotic tasks like localization and pose
tracking.However, commonly used simulators have high hardware requirements and
are therefore used mostly on high-end computers. In this paper, we present an
approach to simulate range sensors directly on embedded hardware of mobile
robots that use triangle meshes as environment map. This library called Rmagine
allows a robot to simulate sensor data for arbitrary range sensors directly on
board via raytracing. Since robots typically only have limited computational
resources, the Rmagine aims at being flexible and lightweight, while scaling
well even to large environment maps. It runs on several platforms like Laptops
or embedded computing boards like Nvidia Jetson by putting an unified API over
the specific proprietary libraries provided by the hardware manufacturers. This
work is designed to support the future development of robotic applications
depending on simulation of range data that could previously not be computed in
reasonable time on mobile systems
Human Performance Modeling and Rendering via Neural Animated Mesh
We have recently seen tremendous progress in the neural advances for
photo-real human modeling and rendering. However, it's still challenging to
integrate them into an existing mesh-based pipeline for downstream
applications. In this paper, we present a comprehensive neural approach for
high-quality reconstruction, compression, and rendering of human performances
from dense multi-view videos. Our core intuition is to bridge the traditional
animated mesh workflow with a new class of highly efficient neural techniques.
We first introduce a neural surface reconstructor for high-quality surface
generation in minutes. It marries the implicit volumetric rendering of the
truncated signed distance field (TSDF) with multi-resolution hash encoding. We
further propose a hybrid neural tracker to generate animated meshes, which
combines explicit non-rigid tracking with implicit dynamic deformation in a
self-supervised framework. The former provides the coarse warping back into the
canonical space, while the latter implicit one further predicts the
displacements using the 4D hash encoding as in our reconstructor. Then, we
discuss the rendering schemes using the obtained animated meshes, ranging from
dynamic texturing to lumigraph rendering under various bandwidth settings. To
strike an intricate balance between quality and bandwidth, we propose a
hierarchical solution by first rendering 6 virtual views covering the performer
and then conducting occlusion-aware neural texture blending. We demonstrate the
efficacy of our approach in a variety of mesh-based applications and
photo-realistic free-view experiences on various platforms, i.e., inserting
virtual human performances into real environments through mobile AR or
immersively watching talent shows with VR headsets.Comment: 18 pages, 17 figure
Neural Fields for Robotic Object Manipulation from a Single Image
We present a unified and compact representation for object rendering, 3D
reconstruction, and grasp pose prediction that can be inferred from a single
image within a few seconds. We achieve this by leveraging recent advances in
the Neural Radiance Field (NeRF) literature that learn category-level priors
and fine-tune on novel objects with minimal data and time. Our insight is that
we can learn a compact shape representation and extract meaningful additional
information from it, such as grasping poses. We believe this to be the first
work to retrieve grasping poses directly from a NeRF-based representation using
a single viewpoint (RGB-only), rather than going through a secondary network
and/or representation. When compared to prior art, our method is two to three
orders of magnitude smaller while achieving comparable performance at view
reconstruction and grasping. Accompanying our method, we also propose a new
dataset of rendered shoes for training a sim-2-real NeRF method with grasping
poses for different widths of grippers.Comment: Submitted to ICRA 202
Aria-NeRF: Multimodal Egocentric View Synthesis
We seek to accelerate research in developing rich, multimodal scene models
trained from egocentric data, based on differentiable volumetric ray-tracing
inspired by Neural Radiance Fields (NeRFs). The construction of a NeRF-like
model from an egocentric image sequence plays a pivotal role in understanding
human behavior and holds diverse applications within the realms of VR/AR. Such
egocentric NeRF-like models may be used as realistic simulations, contributing
significantly to the advancement of intelligent agents capable of executing
tasks in the real-world. The future of egocentric view synthesis may lead to
novel environment representations going beyond today's NeRFs by augmenting
visual data with multimodal sensors such as IMU for egomotion tracking, audio
sensors to capture surface texture and human language context, and eye-gaze
trackers to infer human attention patterns in the scene. To support and
facilitate the development and evaluation of egocentric multimodal scene
modeling, we present a comprehensive multimodal egocentric video dataset. This
dataset offers a comprehensive collection of sensory data, featuring RGB
images, eye-tracking camera footage, audio recordings from a microphone,
atmospheric pressure readings from a barometer, positional coordinates from
GPS, connectivity details from Wi-Fi and Bluetooth, and information from
dual-frequency IMU datasets (1kHz and 800Hz) paired with a magnetometer. The
dataset was collected with the Meta Aria Glasses wearable device platform. The
diverse data modalities and the real-world context captured within this dataset
serve as a robust foundation for furthering our understanding of human behavior
and enabling more immersive and intelligent experiences in the realms of VR,
AR, and robotics
4K4D: Real-Time 4D View Synthesis at 4K Resolution
This paper targets high-fidelity and real-time view synthesis of dynamic 3D
scenes at 4K resolution. Recently, some methods on dynamic view synthesis have
shown impressive rendering quality. However, their speed is still limited when
rendering high-resolution images. To overcome this problem, we propose 4K4D, a
4D point cloud representation that supports hardware rasterization and enables
unprecedented rendering speed. Our representation is built on a 4D feature grid
so that the points are naturally regularized and can be robustly optimized. In
addition, we design a novel hybrid appearance model that significantly boosts
the rendering quality while preserving efficiency. Moreover, we develop a
differentiable depth peeling algorithm to effectively learn the proposed model
from RGB videos. Experiments show that our representation can be rendered at
over 400 FPS on the DNA-Rendering dataset at 1080p resolution and 80 FPS on the
ENeRF-Outdoor dataset at 4K resolution using an RTX 4090 GPU, which is 30x
faster than previous methods and achieves the state-of-the-art rendering
quality. Our project page is available at https://zju3dv.github.io/4k4d/.Comment: Project Page: https://zju3dv.github.io/4k4
VGOS: Voxel Grid Optimization for View Synthesis from Sparse Inputs
Neural Radiance Fields (NeRF) has shown great success in novel view synthesis
due to its state-of-the-art quality and flexibility. However, NeRF requires
dense input views (tens to hundreds) and a long training time (hours to days)
for a single scene to generate high-fidelity images. Although using the voxel
grids to represent the radiance field can significantly accelerate the
optimization process, we observe that for sparse inputs, the voxel grids are
more prone to overfitting to the training views and will have holes and
floaters, which leads to artifacts. In this paper, we propose VGOS, an approach
for fast (3-5 minutes) radiance field reconstruction from sparse inputs (3-10
views) to address these issues. To improve the performance of voxel-based
radiance field in sparse input scenarios, we propose two methods: (a) We
introduce an incremental voxel training strategy, which prevents overfitting by
suppressing the optimization of peripheral voxels in the early stage of
reconstruction. (b) We use several regularization techniques to smooth the
voxels, which avoids degenerate solutions. Experiments demonstrate that VGOS
achieves state-of-the-art performance for sparse inputs with super-fast
convergence. Code will be available at https://github.com/SJoJoK/VGOS.Comment: IJCAI 2023 Accepted (Main Track
A graphics processing unit based method for dynamic real-time global illumination
Real-time realistic image synthesis for virtual environments has been one of the most actively researched
areas in computer graphics for over a decade. Images that display physically correct illumination of an
environment can be simulated by evaluating a multi-dimensional integral equation, called the rendering
equation, over the surfaces of the environment. Many global illumination algorithms such as pathtracing,
photon mapping and distributed ray-tracing can produce realistic images but are generally unable
to cope with dynamic lighting and objects at interactive rates. It still remains one of most challenging
problems to simulate physically correctly illuminated dynamic environments without a substantial preprocessing
step.
In this thesis we present a rendering system for dynamic environments by implementing a customized
rasterizer for global illumination entirely on the graphics hardware, the Graphical Processing
Unit. Our research focuses on a parameterization of discrete visibility field for efficient indirect illumination
computation. In order to generate the visibility field, we propose a CUDA-based (Compute
Unified Device Architecture) rasterizer which builds Layered Hit Buffers (LHB) by rasterizing polygons
into multi-layered structural buffers in parallel. The LHB provides a fast visibility function for any direction
at any point. We propose a cone approximation solution to resolve an aliasing problem due to
limited directional discretization. We also demonstrate how to remove structure noises by adapting an
interleaved sampling scheme and discontinuity buffer. We show that a gathering method amortized with
a multi-level Quasi Mont Carlo method can evaluate the rendering equation in real-time.
The method can realize real-time walk-through of a complex virtual environment that has a mixture
of diffuse and glossy reflection, computing multiple indirect bounces on the fly. We show that our method
is capable of simulating fully dynamic environments including changes of view, materials, lighting and
objects at interactive rates on commodity level graphics hardware
3D Visualization using virtual view generation for stereoscopic hardware
To summarize, the final purpose is to create a compact tool that generates virtual
stereo views at a desired position, being able to move and rotate it, from the information
of several cameras spread in the area. These views will allow to visualize the scene in
3D, using a device that reproduces the stereoscopic effect, from the current point of
vie
- …