Search CORE

509 research outputs found

SurfelNeRF: Neural Surfel Radiance Fields for Online Photorealistic Reconstruction of Indoor Scenes

Author: Cao Yan-Pei
Gao Yiming
Shan Ying
Publication venue
Publication date: 18/04/2023
Field of study

Online reconstructing and rendering of large-scale indoor scenes is a long-standing challenge. SLAM-based methods can reconstruct 3D scene geometry progressively in real time but can not render photorealistic results. While NeRF-based methods produce promising novel view synthesis results, their long offline optimization time and lack of geometric constraints pose challenges to efficiently handling online input. Inspired by the complementary advantages of classical 3D reconstruction and NeRF, we thus investigate marrying explicit geometric representation with NeRF rendering to achieve efficient online reconstruction and high-quality rendering. We introduce SurfelNeRF, a variant of neural radiance field which employs a flexible and scalable neural surfel representation to store geometric attributes and extracted appearance features from input images. We further extend the conventional surfel-based fusion scheme to progressively integrate incoming input frames into the reconstructed global neural scene representation. In addition, we propose a highly-efficient differentiable rasterization scheme for rendering neural surfel radiance fields, which helps SurfelNeRF achieve

10\times

speedups in training and inference time, respectively. Experimental results show that our method achieves the state-of-the-art 23.82 PSNR and 29.58 PSNR on ScanNet in feedforward inference and per-scene optimization settings, respectively.Comment: To appear in CVPR 202

arXiv.org e-Print Archive

ID-Pose: Sparse-view Camera Pose Estimation by Inverting Diffusion Models

Author: Cao Yan-Pei
Cheng Weihao
Shan Ying
Publication venue
Publication date: 29/06/2023
Field of study

Given sparse views of an object, estimating their camera poses is a long-standing and intractable problem. We harness the pre-trained diffusion model of novel views conditioned on viewpoints (Zero-1-to-3). We present ID-Pose which inverses the denoising diffusion process to estimate the relative pose given two input images. ID-Pose adds a noise on one image, and predicts the noise conditioned on the other image and a decision variable for the pose. The prediction error is used as the objective to find the optimal pose with the gradient descent method. ID-Pose can handle more than two images and estimate each of the poses with multiple image pairs from triangular relationships. ID-Pose requires no training and generalizes to real-world images. We conduct experiments using high-quality real-scanned 3D objects, where ID-Pose significantly outperforms state-of-the-art methods.Comment: 7 pages. Github: https://xt4d.github.io/id-pose

arXiv.org e-Print Archive

DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models

Author: Cao Yan-Pei
Cao Yukang
Han Kai
Shan Ying
Wong Kwan-Yee K.
Publication venue
Publication date: 30/11/2023
Field of study

We present DreamAvatar, a text-and-shape guided framework for generating high-quality 3D human avatars with controllable poses. While encouraging results have been reported by recent methods on text-guided 3D common object generation, generating high-quality human avatars remains an open challenge due to the complexity of the human body's shape, pose, and appearance. We propose DreamAvatar to tackle this challenge, which utilizes a trainable NeRF for predicting density and color for 3D points and pretrained text-to-image diffusion models for providing 2D self-supervision. Specifically, we leverage the SMPL model to provide shape and pose guidance for the generation. We introduce a dual-observation-space design that involves the joint optimization of a canonical space and a posed space that are related by a learnable deformation field. This facilitates the generation of more complete textures and geometry faithful to the target pose. We also jointly optimize the losses computed from the full body and from the zoomed-in 3D head to alleviate the common multi-face ''Janus'' problem and improve facial details in the generated avatars. Extensive evaluations demonstrate that DreamAvatar significantly outperforms existing methods, establishing a new state-of-the-art for text-and-shape guided 3D human avatar generation.Comment: Project page: https://yukangcao.github.io/DreamAvatar

arXiv.org e-Print Archive

OmniZoomer: Learning to Move and Zoom in on Sphere at High-Resolution

Author: Ai Hao
Cao Yan-Pei
Cao Zidong
Qie Xiaohu
Shan Ying
Wang Lin
Publication venue
Publication date: 18/08/2023
Field of study

Omnidirectional images (ODIs) have become increasingly popular, as their large field-of-view (FoV) can offer viewers the chance to freely choose the view directions in immersive environments such as virtual reality. The M\"obius transformation is typically employed to further provide the opportunity for movement and zoom on ODIs, but applying it to the image level often results in blurry effect and aliasing problem. In this paper, we propose a novel deep learning-based approach, called \textbf{OmniZoomer}, to incorporate the M\"obius transformation into the network for movement and zoom on ODIs. By learning various transformed feature maps under different conditions, the network is enhanced to handle the increasing edge curvatures, which alleviates the blurry effect. Moreover, to address the aliasing problem, we propose two key components. Firstly, to compensate for the lack of pixels for describing curves, we enhance the feature maps in the high-resolution (HR) space and calculate the transformed index map with a spatial index generation module. Secondly, considering that ODIs are inherently represented in the spherical space, we propose a spherical resampling module that combines the index map and HR feature maps to transform the feature maps for better spherical correlation. The transformed feature maps are decoded to output a zoomed ODI. Experiments show that our method can produce HR and high-quality ODIs with the flexibility to move and zoom in to the object of interest. Project page is available at http://vlislab22.github.io/OmniZoomer/.Comment: Accepted by ICCV 202

arXiv.org e-Print Archive

The Study of Dust Formation of Six Tidal Disruption Events

Author: Cao Xian-Mao
Gan Wen-Pei
Li Jing-Yao
Wang Shan-Qin
Publication venue
Publication date: 08/10/2022
Field of study

This paper investigates eleven (UV-)optical-infrared (IR) spectral energy distributions (SEDs) of six tidal disruption events (TDEs), which are ASASSN-14li, ASASSN-15lh, ASASSN-18ul, ASASSN-18zj, PS18kh, and ZTF18acaqdaa. We find that all the SEDs show evident IR excesses. We invoke the blackbody plus dust emission model to fit the SEDs, and find that the model can account for the SEDs. The derived masses of the dust surrounding ASASSN-14li, ASASSN-15lh, ASASSN-18ul, ASASSN-18zj, PS18kh, and ZTF18acaqdaa are respectively

\sim0.7-1.0\,(1.5-2.2)\times10^{-4}\,M_\odot

\sim0.6-3.1\,(1.4-6.3)\times10^{-2}\,M_\odot

\sim1.0\,(2.8)\times10^{-4}\,M_\odot

\sim0.1-1.6\,(0.3-3.3)\times10^{-3}\,M_\odot

\sim1.0\,(2.0)\times10^{-3}\,M_\odot

, and

\sim 1.1\,(2.9)\times10^{-3}\,M_\odot

, if the dust is graphite (silicate). The temperature of the graphite (silicate) dust of the six TDEs are respectively

\sim1140-1430\,(1210-1520)

\,K,

\sim1030-1380\,(1100-1460)

\,K,

\sim1530\,(1540)

\,K,

\sim960-1380\,(1020-1420)

\,K,

\sim900\,(950)

\,K, and

\sim1600\,(1610)

\,K. By comparing the derived temperatures to the vaporization temperature of graphite (

\sim 1900

\,K) and silicate (

\sim 1100-1500

\,K), we suggest that the IR excesses of PS18kh can be explained by both the graphite and silicate dust, the rest five TDEs favor the graphite dust while the silicate dust model cannot be excluded. Moreover, we demonstrate the lower limits of the radii of the dust shells surrounding the six TDEs are significantly larger than those of the radii of the photospheres at the first epochs of SEDs, indicating that the dust might exist before the the TDEs occurred.Comment: 13 pages, 4 figures, 4 tables, submitted to Ap

arXiv.org e-Print Archive

Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models

Author: Cao Yan-Pei
Cheng Weihao
Gao Shenghua
Qie Xiaohu
Shan Ying
Wang Xintao
Xu Jiale
Publication venue
Publication date: 03/04/2023
Field of study

Recent CLIP-guided 3D optimization methods, such as DreamFields and PureCLIPNeRF, have achieved impressive results in zero-shot text-to-3D synthesis. However, due to scratch training and random initialization without prior knowledge, these methods often fail to generate accurate and faithful 3D structures that conform to the input text. In this paper, we make the first attempt to introduce explicit 3D shape priors into the CLIP-guided 3D optimization process. Specifically, we first generate a high-quality 3D shape from the input text in the text-to-shape stage as a 3D shape prior. We then use it as the initialization of a neural radiance field and optimize it with the full prompt. To address the challenging text-to-shape generation task, we present a simple yet effective approach that directly bridges the text and image modalities with a powerful text-to-image diffusion model. To narrow the style domain gap between the images synthesized by the text-to-image diffusion model and shape renderings used to train the image-to-shape generator, we further propose to jointly optimize a learnable text prompt and fine-tune the text-to-image diffusion model for rendering-style image generation. Our method, Dream3D, is capable of generating imaginative 3D content with superior visual quality and shape accuracy compared to state-of-the-art methods.Comment: Accepted by CVPR 2023. Project page: https://bluestyle97.github.io/dream3d

arXiv.org e-Print Archive

PanoGRF: Generalizable Spherical Radiance Fields for Wide-baseline Panoramas

Author: Cao Yan-Pei
Chen Zheng
Guo Yuan-Chen
Shan Ying
Wang Chen
Zhang Song-Hai
Publication venue
Publication date: 02/06/2023
Field of study

Achieving an immersive experience enabling users to explore virtual environments with six degrees of freedom (6DoF) is essential for various applications such as virtual reality (VR). Wide-baseline panoramas are commonly used in these applications to reduce network bandwidth and storage requirements. However, synthesizing novel views from these panoramas remains a key challenge. Although existing neural radiance field methods can produce photorealistic views under narrow-baseline and dense image captures, they tend to overfit the training views when dealing with \emph{wide-baseline} panoramas due to the difficulty in learning accurate geometry from sparse

360^{\circ}

views. To address this problem, we propose PanoGRF, Generalizable Spherical Radiance Fields for Wide-baseline Panoramas, which construct spherical radiance fields incorporating

360^{\circ}

scene priors. Unlike generalizable radiance fields trained on perspective images, PanoGRF avoids the information loss from panorama-to-perspective conversion and directly aggregates geometry and appearance features of 3D sample points from each panoramic view based on spherical projection. Moreover, as some regions of the panorama are only visible from one view while invisible from others under wide baseline settings, PanoGRF incorporates

360^{\circ}

monocular depth priors into spherical depth estimation to improve the geometry features. Experimental results on multiple panoramic datasets demonstrate that PanoGRF significantly outperforms state-of-the-art generalizable view synthesis methods for wide-baseline panoramas (e.g., OmniSyn) and perspective images (e.g., IBRNet, NeuRay)

arXiv.org e-Print Archive

MonoNeuralFusion: Online Monocular Neural 3D Reconstruction with Geometric Priors

Author: Cao Yan-Pei
Fu Hongbo
Huang Shi-Sheng
Mu Tai-Jiang
Shan Ying
Zou Zi-Xin
Publication venue
Publication date: 29/09/2022
Field of study

High-fidelity 3D scene reconstruction from monocular videos continues to be challenging, especially for complete and fine-grained geometry reconstruction. The previous 3D reconstruction approaches with neural implicit representations have shown a promising ability for complete scene reconstruction, while their results are often over-smooth and lack enough geometric details. This paper introduces a novel neural implicit scene representation with volume rendering for high-fidelity online 3D scene reconstruction from monocular videos. For fine-grained reconstruction, our key insight is to incorporate geometric priors into both the neural implicit scene representation and neural volume rendering, thus leading to an effective geometry learning mechanism based on volume rendering optimization. Benefiting from this, we present MonoNeuralFusion to perform the online neural 3D reconstruction from monocular videos, by which the 3D scene geometry is efficiently generated and optimized during the on-the-fly 3D monocular scanning. The extensive comparisons with state-of-the-art approaches show that our MonoNeuralFusion consistently generates much better complete and fine-grained reconstruction results, both quantitatively and qualitatively.Comment: 12 pages, 12 figure

arXiv.org e-Print Archive

Sparse3D: Distilling Multiview-Consistent Diffusion for Object Reconstruction from Sparse Views

Author: Cao Yan-Pei
Cheng Weihao
Huang Shi-Sheng
Shan Ying
Zhang Song-Hai
Zou Zi-Xin
Publication venue
Publication date: 20/12/2023
Field of study

Reconstructing 3D objects from extremely sparse views is a long-standing and challenging problem. While recent techniques employ image diffusion models for generating plausible images at novel viewpoints or for distilling pre-trained diffusion priors into 3D representations using score distillation sampling (SDS), these methods often struggle to simultaneously achieve high-quality, consistent, and detailed results for both novel-view synthesis (NVS) and geometry. In this work, we present Sparse3D, a novel 3D reconstruction method tailored for sparse view inputs. Our approach distills robust priors from a multiview-consistent diffusion model to refine a neural radiance field. Specifically, we employ a controller that harnesses epipolar features from input views, guiding a pre-trained diffusion model, such as Stable Diffusion, to produce novel-view images that maintain 3D consistency with the input. By tapping into 2D priors from powerful image diffusion models, our integrated model consistently delivers high-quality results, even when faced with open-world objects. To address the blurriness introduced by conventional SDS, we introduce the category-score distillation sampling (C-SDS) to enhance detail. We conduct experiments on CO3DV2 which is a multi-view dataset of real-world objects. Both quantitative and qualitative evaluations demonstrate that our approach outperforms previous state-of-the-art works on the metrics regarding NVS and geometry reconstruction

arXiv.org e-Print Archive