2,519 research outputs found
Learning to reconstruct and understand indoor scenes from sparse views
This paper proposes a new method for simultaneous 3D reconstruction and semantic segmentation for indoor scenes. Unlike existing methods that require recording a video using a color camera and/or a depth camera, our method only needs a small number of (e.g., 3~5) color images from uncalibrated sparse views, which significantly simplifies data acquisition and broadens applicable scenarios. To achieve promising 3D reconstruction from sparse views with limited overlap, our method first recovers the depth map and semantic information for each view, and then fuses the depth maps into a 3D scene. To this end, we design an iterative deep architecture, named IterNet, to estimate the depth map and semantic segmentation alternately. To obtain accurate alignment between views with limited overlap, we further propose a joint global and local registration method to reconstruct a 3D scene with semantic information. We also make available a new indoor synthetic dataset, containing photorealistic high-resolution RGB images, accurate depth maps and pixel-level semantic labels for thousands of complex layouts. Experimental results on public datasets and our dataset demonstrate that our method achieves more accurate depth estimation, smaller semantic segmentation errors, and better 3D reconstruction results over state-of-the-art methods
OctNetFusion: Learning Depth Fusion from Data
In this paper, we present a learning based approach to depth fusion, i.e.,
dense 3D reconstruction from multiple depth images. The most common approach to
depth fusion is based on averaging truncated signed distance functions, which
was originally proposed by Curless and Levoy in 1996. While this method is
simple and provides great results, it is not able to reconstruct (partially)
occluded surfaces and requires a large number frames to filter out sensor noise
and outliers. Motivated by the availability of large 3D model repositories and
recent advances in deep learning, we present a novel 3D CNN architecture that
learns to predict an implicit surface representation from the input depth maps.
Our learning based method significantly outperforms the traditional volumetric
fusion approach in terms of noise reduction and outlier suppression. By
learning the structure of real world 3D objects and scenes, our approach is
further able to reconstruct occluded regions and to fill in gaps in the
reconstruction. We demonstrate that our learning based approach outperforms
both vanilla TSDF fusion as well as TV-L1 fusion on the task of volumetric
fusion. Further, we demonstrate state-of-the-art 3D shape completion results.Comment: 3DV 2017, https://github.com/griegler/octnetfusio
Data-Driven Shape Analysis and Processing
Data-driven methods play an increasingly important role in discovering
geometric, structural, and semantic relationships between 3D shapes in
collections, and applying this analysis to support intelligent modeling,
editing, and visualization of geometric data. In contrast to traditional
approaches, a key feature of data-driven approaches is that they aggregate
information from a collection of shapes to improve the analysis and processing
of individual shapes. In addition, they are able to learn models that reason
about properties and relationships of shapes without relying on hard-coded
rules or explicitly programmed instructions. We provide an overview of the main
concepts and components of these techniques, and discuss their application to
shape classification, segmentation, matching, reconstruction, modeling and
exploration, as well as scene analysis and synthesis, through reviewing the
literature and relating the existing works with both qualitative and numerical
comparisons. We conclude our report with ideas that can inspire future research
in data-driven shape analysis and processing.Comment: 10 pages, 19 figure
Associative3D: Volumetric Reconstruction from Sparse Views
This paper studies the problem of 3D volumetric reconstruction from two views
of a scene with an unknown camera. While seemingly easy for humans, this
problem poses many challenges for computers since it requires simultaneously
reconstructing objects in the two views while also figuring out their
relationship. We propose a new approach that estimates reconstructions,
distributions over the camera/object and camera/camera transformations, as well
as an inter-view object affinity matrix. This information is then jointly
reasoned over to produce the most likely explanation of the scene. We train and
test our approach on a dataset of indoor scenes, and rigorously evaluate the
merits of our joint reasoning approach. Our experiments show that it is able to
recover reasonable scenes from sparse views, while the problem is still
challenging. Project site: https://jasonqsy.github.io/Associative3DComment: ECCV 202
Generative Scene Synthesis via Incremental View Inpainting using RGBD Diffusion Models
We address the challenge of recovering an underlying scene geometry and
colors from a sparse set of RGBD view observations. In this work, we present a
new solution that sequentially generates novel RGBD views along a camera
trajectory, and the scene geometry is simply the fusion result of these views.
More specifically, we maintain an intermediate surface mesh used for rendering
new RGBD views, which subsequently becomes complete by an inpainting network;
each rendered RGBD view is later back-projected as a partial surface and is
supplemented into the intermediate mesh. The use of intermediate mesh and
camera projection helps solve the refractory problem of multi-view
inconsistency. We practically implement the RGBD inpainting network as a
versatile RGBD diffusion model, which is previously used for 2D generative
modeling; we make a modification to its reverse diffusion process to enable our
use. We evaluate our approach on the task of 3D scene synthesis from sparse
RGBD inputs; extensive experiments on the ScanNet dataset demonstrate the
superiority of our approach over existing ones. Project page:
https://jblei.site/project-pages/rgbd-diffusion.htm
Neural Radiance Fields: Past, Present, and Future
The various aspects like modeling and interpreting 3D environments and
surroundings have enticed humans to progress their research in 3D Computer
Vision, Computer Graphics, and Machine Learning. An attempt made by Mildenhall
et al in their paper about NeRFs (Neural Radiance Fields) led to a boom in
Computer Graphics, Robotics, Computer Vision, and the possible scope of
High-Resolution Low Storage Augmented Reality and Virtual Reality-based 3D
models have gained traction from res with more than 1000 preprints related to
NeRFs published. This paper serves as a bridge for people starting to study
these fields by building on the basics of Mathematics, Geometry, Computer
Vision, and Computer Graphics to the difficulties encountered in Implicit
Representations at the intersection of all these disciplines. This survey
provides the history of rendering, Implicit Learning, and NeRFs, the
progression of research on NeRFs, and the potential applications and
implications of NeRFs in today's world. In doing so, this survey categorizes
all the NeRF-related research in terms of the datasets used, objective
functions, applications solved, and evaluation criteria for these applications.Comment: 413 pages, 9 figures, 277 citation
- …