64 research outputs found
Relightable Neural Human Assets from Multi-view Gradient Illuminations
Human modeling and relighting are two fundamental problems in computer vision
and graphics, where high-quality datasets can largely facilitate related
research. However, most existing human datasets only provide multi-view human
images captured under the same illumination. Although valuable for modeling
tasks, they are not readily used in relighting problems. To promote research in
both fields, in this paper, we present UltraStage, a new 3D human dataset that
contains more than 2,000 high-quality human assets captured under both
multi-view and multi-illumination settings. Specifically, for each example, we
provide 32 surrounding views illuminated with one white light and two gradient
illuminations. In addition to regular multi-view images, gradient illuminations
help recover detailed surface normal and spatially-varying material maps,
enabling various relighting applications. Inspired by recent advances in neural
representation, we further interpret each example into a neural human asset
which allows novel view synthesis under arbitrary lighting conditions. We show
our neural human assets can achieve extremely high capture performance and are
capable of representing fine details such as facial wrinkles and cloth folds.
We also validate UltraStage in single image relighting tasks, training neural
networks with virtual relighted data from neural assets and demonstrating
realistic rendering improvements over prior arts. UltraStage will be publicly
available to the community to stimulate significant future developments in
various human modeling and rendering tasks. The dataset is available at
https://miaoing.github.io/RNHA.Comment: Project page: https://miaoing.github.io/RNH
Recovering refined surface normals for relighting clothing in dynamic scenes
In this paper we present a method to relight captured 3D video sequences of non-rigid, dynamic scenes, such as clothing of real actors, reconstructed from multiple view video. A view-dependent approach is introduced to refine an initial coarse surface reconstruction using shape-from-shading to estimate detailed surface normals. The prior surface approximation is used to constrain the simultaneous estimation of surface normals and scene illumination, under the assumption of Lambertian surface reflectance. This approach enables detailed surface normals of a moving non-rigid object to be estimated from a single image frame. Refined normal estimates from multiple views are integrated into a single surface normal map. This approach allows highly non-rigid surfaces, such as creases in clothing, to be relit whilst preserving the detailed dynamics observed in video
LumiGAN: Unconditional Generation of Relightable 3D Human Faces
Unsupervised learning of 3D human faces from unstructured 2D image data is an
active research area. While recent works have achieved an impressive level of
photorealism, they commonly lack control of lighting, which prevents the
generated assets from being deployed in novel environments. To this end, we
introduce LumiGAN, an unconditional Generative Adversarial Network (GAN) for 3D
human faces with a physically based lighting module that enables relighting
under novel illumination at inference time. Unlike prior work, LumiGAN can
create realistic shadow effects using an efficient visibility formulation that
is learned in a self-supervised manner. LumiGAN generates plausible physical
properties for relightable faces, including surface normals, diffuse albedo,
and specular tint without any ground truth data. In addition to relightability,
we demonstrate significantly improved geometry generation compared to
state-of-the-art non-relightable 3D GANs and notably better photorealism than
existing relightable GANs.Comment: Project page: https://boyangdeng.com/projects/lumiga
ILSH: The Imperial Light-Stage Head Dataset for Human Head View Synthesis
This paper introduces the Imperial Light-Stage Head (ILSH) dataset, a novel
light-stage-captured human head dataset designed to support view synthesis
academic challenges for human heads. The ILSH dataset is intended to facilitate
diverse approaches, such as scene-specific or generic neural rendering,
multiple-view geometry, 3D vision, and computer graphics, to further advance
the development of photo-realistic human avatars. This paper details the setup
of a light-stage specifically designed to capture high-resolution (4K) human
head images and describes the process of addressing challenges (preprocessing,
ethical issues) in collecting high-quality data. In addition to the data
collection, we address the split of the dataset into train, validation, and
test sets. Our goal is to design and support a fair view synthesis challenge
task for this novel dataset, such that a similar level of performance can be
maintained and expected when using the test set, as when using the validation
set. The ILSH dataset consists of 52 subjects captured using 24 cameras with
all 82 lighting sources turned on, resulting in a total of 1,248 close-up head
images, border masks, and camera pose pairs.Comment: ICCV 2023 Workshop, 9 pages, 6 figure
Evaluation and Optimization of Rendering Techniques for Autonomous Driving Simulation
In order to meet the demand for higher scene rendering quality from some
autonomous driving teams (such as those focused on CV), we have decided to use
an offline simulation industrial rendering framework instead of real-time
rendering in our autonomous driving simulator. Our plan is to generate
lower-quality scenes using a game engine, extract them, and then use an IQA
algorithm to validate the improvement in scene quality achieved through offline
rendering. The improved scenes will then be used for training
Intrinsic Textures for Relightable Free-Viewpoint Video
This paper presents an approach to estimate the intrinsic texture properties (albedo, shading, normal) of scenes from multiple view acquisition under unknown illumination conditions. We introduce the concept of intrinsic textures, which are pixel-resolution surface textures representing the intrinsic appearance parameters of a scene. Unlike previous video relighting methods, the approach does not assume regions of uniform albedo, which makes it applicable to richly textured scenes. We show that intrinsic image methods can be used to refine an initial, low-frequency shading estimate based on a global lighting reconstruction from an original texture and coarse scene geometry in order to resolve the inherent global ambiguity in shading. The method is applied to relighting of free-viewpoint rendering from multiple view video capture. This demonstrates relighting with reproduction of fine surface detail. Quantitative evaluation on synthetic models with textured appearance shows accurate estimation of intrinsic surface reflectance properties. © 2014 Springer International Publishing
Full Body Acting Rehearsal in a Networked Virtual Environment-A Case Study
In order to rehearse for a play or a scene from a movie, it is generally required that the actors are physically present at the same time in the same place. In this paper we present an example and experience of a full body motion shared virtual environment (SVE) for rehearsal. The system allows actors and directors to meet in an SVE in order to rehearse scenes for a play or a movie, that is, to perform some dialogue and blocking (positions, movements, and displacements of actors in the scene) rehearsal through a full body interactive virtual reality (VR) system. The system combines immersive VR rendering techniques as well as network capabilities together with full body tracking. Two actors and a director rehearsed from separate locations. One actor and the director were in London (located in separate rooms) while the second actor was in Barcelona. The Barcelona actor used a wide field-of-view head-tracked head-mounted display, and wore a body suit for real-time motion capture and display. The London actor was in a Cave system, with head and partial body tracking. Each actor was presented to the other as an avatar in the shared virtual environment, and the director could see the whole scenario on a desktop display, and intervene by voice commands. A video stream in a window displayed in the virtual environment also represented the director. The London participant was a professional actor, who afterward commented on the utility of the system for acting rehearsal. It was concluded that full body tracking and corresponding real-time display of all the actors' movements would be a critical requirement, and that blocking was possible down to the level of detail of gestures. Details of the implementation, actors, and director experiences are provided
- …