785 research outputs found
Sparse-to-Continuous: Enhancing Monocular Depth Estimation using Occupancy Maps
This paper addresses the problem of single image depth estimation (SIDE),
focusing on improving the quality of deep neural network predictions. In a
supervised learning scenario, the quality of predictions is intrinsically
related to the training labels, which guide the optimization process. For
indoor scenes, structured-light-based depth sensors (e.g. Kinect) are able to
provide dense, albeit short-range, depth maps. On the other hand, for outdoor
scenes, LiDARs are considered the standard sensor, which comparatively provides
much sparser measurements, especially in areas further away. Rather than
modifying the neural network architecture to deal with sparse depth maps, this
article introduces a novel densification method for depth maps, using the
Hilbert Maps framework. A continuous occupancy map is produced based on 3D
points from LiDAR scans, and the resulting reconstructed surface is projected
into a 2D depth map with arbitrary resolution. Experiments conducted with
various subsets of the KITTI dataset show a significant improvement produced by
the proposed Sparse-to-Continuous technique, without the introduction of extra
information into the training stage.Comment: Accepted. (c) 2019 IEEE. Personal use of this material is permitted.
Permission from IEEE must be obtained for all other uses, in any current or
future media, including reprinting/republishing this material for advertising
or promotional purposes, creating new collective works, for resale or
redistribution to servers or lists, or reuse of any copyrighted component of
this work in other work
Efficient multiple occlusion queries for scene graph systems
Image space occlusion culling is an useful approach to reduce the rendering load of large polygonal models. Like most large model techniques, it trades overhead costs with the rendering costs of the possibly occluded geometry. Meanwhile, modern graphics hardware supports occlusion culling. Unfortunately these hardware extensions consume fillrate and latency costs.
In this paper, we propose a new technique for scene graph traversal optimized for efficient use of occlusion queries. Our approach uses several Occupancy Maps to organize the scene graph traversal. During traversal hierarchical occlusion culling, view frustrum culling and rendering is performed.
The occlusion information is efficiently determined by asynchronous multiple occlusion queries with hardware-supported query functionality. To avoid redundant results, we arrange these multiple occlusion queries according to the information of several Occupancy Maps. Our presented technique is conservative and benefits from a partial depth order of the geometry
Structured prediction of unobserved voxels from a single depth image
Building a complete 3D model of a scene, given only a single depth image, is underconstrained. To gain a full volumetric model, one needs either multiple views, or a single view together with a library of unambiguous 3D models that will fit the shape of each individual object in the scene. We hypothesize that objects of dissimilar semantic classes often share similar 3D shape components, enabling a limited dataset to model the shape of a wide range of objects, and hence estimate their hidden geometry. Exploring this hypothesis, we propose an algorithm that can complete the unobserved geometry of tabletop-sized objects, based on a supervised model trained on already available volumetric elements. Our model maps from a local observation in a single depth image to an estimate of the surface shape in the surrounding neighborhood. We validate our approach both qualitatively and quantitatively on a range of indoor object collections and challenging real scenes
SparseNeuS: Fast Generalizable Neural Surface Reconstruction from Sparse Views
We introduce SparseNeuS, a novel neural rendering based method for the task
of surface reconstruction from multi-view images. This task becomes more
difficult when only sparse images are provided as input, a scenario where
existing neural reconstruction approaches usually produce incomplete or
distorted results. Moreover, their inability of generalizing to unseen new
scenes impedes their application in practice. Contrarily, SparseNeuS can
generalize to new scenes and work well with sparse images (as few as 2 or 3).
SparseNeuS adopts signed distance function (SDF) as the surface representation,
and learns generalizable priors from image features by introducing geometry
encoding volumes for generic surface prediction. Moreover, several strategies
are introduced to effectively leverage sparse views for high-quality
reconstruction, including 1) a multi-level geometry reasoning framework to
recover the surfaces in a coarse-to-fine manner; 2) a multi-scale color
blending scheme for more reliable color prediction; 3) a consistency-aware
fine-tuning scheme to control the inconsistent regions caused by occlusion and
noise. Extensive experiments demonstrate that our approach not only outperforms
the state-of-the-art methods, but also exhibits good efficiency,
generalizability, and flexibility.Comment: Project page: https://www.xxlong.site/SparseNeuS
Tighter bounding volumes for better occlusion culling performance
Bounding volumes are used in computer graphics to approximate the actual geometric shape of an object in a scene. The main intention is to reduce the costs associated with visibility or interference tests. The bounding volumes most commonly used have been axis-aligned bounding boxes and bounding spheres. In this paper, we propose the use of discrete orientation polytopes (\kdops) as bounding volumes for the specific use of visibility culling. Occlusion tests are computed more accurately using \kdops, but most importantly, they are also computed more efficiently. We illustrate this point through a series of experiments using a wide range of data models under varying viewing conditions. Although no bounding volume works the best in every situation, {\kdops} are often the best, and also work very well in those cases where they are not the best, therefore they provide good results without having to analyze applications and different bounding volumes
FineRecon: Depth-aware Feed-forward Network for Detailed 3D Reconstruction
Recent works on 3D reconstruction from posed images have demonstrated that
direct inference of scene-level 3D geometry without test-time optimization is
feasible using deep neural networks, showing remarkable promise and high
efficiency. However, the reconstructed geometry, typically represented as a 3D
truncated signed distance function (TSDF), is often coarse without fine
geometric details. To address this problem, we propose three effective
solutions for improving the fidelity of inference-based 3D reconstructions. We
first present a resolution-agnostic TSDF supervision strategy to provide the
network with a more accurate learning signal during training, avoiding the
pitfalls of TSDF interpolation seen in previous work. We then introduce a depth
guidance strategy using multi-view depth estimates to enhance the scene
representation and recover more accurate surfaces. Finally, we develop a novel
architecture for the final layers of the network, conditioning the output TSDF
prediction on high-resolution image features in addition to coarse voxel
features, enabling sharper reconstruction of fine details. Our method,
FineRecon, produces smooth and highly accurate reconstructions, showing
significant improvements across multiple depth and 3D reconstruction metrics.Comment: ICCV 202
PixelHuman: Animatable Neural Radiance Fields from Few Images
In this paper, we propose PixelHuman, a novel human rendering model that
generates animatable human scenes from a few images of a person with unseen
identity, views, and poses. Previous work have demonstrated reasonable
performance in novel view and pose synthesis, but they rely on a large number
of images to train and are trained per scene from videos, which requires
significant amount of time to produce animatable scenes from unseen human
images. Our method differs from existing methods in that it can generalize to
any input image for animatable human synthesis. Given a random pose sequence,
our method synthesizes each target scene using a neural radiance field that is
conditioned on a canonical representation and pose-aware pixel-aligned
features, both of which can be obtained through deformation fields learned in a
data-driven manner. Our experiments show that our method achieves
state-of-the-art performance in multiview and novel pose synthesis from
few-shot images.Comment: 8 page
- …