785 research outputs found

    Sparse-to-Continuous: Enhancing Monocular Depth Estimation using Occupancy Maps

    Full text link
    This paper addresses the problem of single image depth estimation (SIDE), focusing on improving the quality of deep neural network predictions. In a supervised learning scenario, the quality of predictions is intrinsically related to the training labels, which guide the optimization process. For indoor scenes, structured-light-based depth sensors (e.g. Kinect) are able to provide dense, albeit short-range, depth maps. On the other hand, for outdoor scenes, LiDARs are considered the standard sensor, which comparatively provides much sparser measurements, especially in areas further away. Rather than modifying the neural network architecture to deal with sparse depth maps, this article introduces a novel densification method for depth maps, using the Hilbert Maps framework. A continuous occupancy map is produced based on 3D points from LiDAR scans, and the resulting reconstructed surface is projected into a 2D depth map with arbitrary resolution. Experiments conducted with various subsets of the KITTI dataset show a significant improvement produced by the proposed Sparse-to-Continuous technique, without the introduction of extra information into the training stage.Comment: Accepted. (c) 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other work

    Efficient multiple occlusion queries for scene graph systems

    Get PDF
    Image space occlusion culling is an useful approach to reduce the rendering load of large polygonal models. Like most large model techniques, it trades overhead costs with the rendering costs of the possibly occluded geometry. Meanwhile, modern graphics hardware supports occlusion culling. Unfortunately these hardware extensions consume fillrate and latency costs. In this paper, we propose a new technique for scene graph traversal optimized for efficient use of occlusion queries. Our approach uses several Occupancy Maps to organize the scene graph traversal. During traversal hierarchical occlusion culling, view frustrum culling and rendering is performed. The occlusion information is efficiently determined by asynchronous multiple occlusion queries with hardware-supported query functionality. To avoid redundant results, we arrange these multiple occlusion queries according to the information of several Occupancy Maps. Our presented technique is conservative and benefits from a partial depth order of the geometry

    Structured prediction of unobserved voxels from a single depth image

    Get PDF
    Building a complete 3D model of a scene, given only a single depth image, is underconstrained. To gain a full volumetric model, one needs either multiple views, or a single view together with a library of unambiguous 3D models that will fit the shape of each individual object in the scene. We hypothesize that objects of dissimilar semantic classes often share similar 3D shape components, enabling a limited dataset to model the shape of a wide range of objects, and hence estimate their hidden geometry. Exploring this hypothesis, we propose an algorithm that can complete the unobserved geometry of tabletop-sized objects, based on a supervised model trained on already available volumetric elements. Our model maps from a local observation in a single depth image to an estimate of the surface shape in the surrounding neighborhood. We validate our approach both qualitatively and quantitatively on a range of indoor object collections and challenging real scenes

    SparseNeuS: Fast Generalizable Neural Surface Reconstruction from Sparse Views

    Full text link
    We introduce SparseNeuS, a novel neural rendering based method for the task of surface reconstruction from multi-view images. This task becomes more difficult when only sparse images are provided as input, a scenario where existing neural reconstruction approaches usually produce incomplete or distorted results. Moreover, their inability of generalizing to unseen new scenes impedes their application in practice. Contrarily, SparseNeuS can generalize to new scenes and work well with sparse images (as few as 2 or 3). SparseNeuS adopts signed distance function (SDF) as the surface representation, and learns generalizable priors from image features by introducing geometry encoding volumes for generic surface prediction. Moreover, several strategies are introduced to effectively leverage sparse views for high-quality reconstruction, including 1) a multi-level geometry reasoning framework to recover the surfaces in a coarse-to-fine manner; 2) a multi-scale color blending scheme for more reliable color prediction; 3) a consistency-aware fine-tuning scheme to control the inconsistent regions caused by occlusion and noise. Extensive experiments demonstrate that our approach not only outperforms the state-of-the-art methods, but also exhibits good efficiency, generalizability, and flexibility.Comment: Project page: https://www.xxlong.site/SparseNeuS

    Tighter bounding volumes for better occlusion culling performance

    Get PDF
    Bounding volumes are used in computer graphics to approximate the actual geometric shape of an object in a scene. The main intention is to reduce the costs associated with visibility or interference tests. The bounding volumes most commonly used have been axis-aligned bounding boxes and bounding spheres. In this paper, we propose the use of discrete orientation polytopes (\kdops) as bounding volumes for the specific use of visibility culling. Occlusion tests are computed more accurately using \kdops, but most importantly, they are also computed more efficiently. We illustrate this point through a series of experiments using a wide range of data models under varying viewing conditions. Although no bounding volume works the best in every situation, {\kdops} are often the best, and also work very well in those cases where they are not the best, therefore they provide good results without having to analyze applications and different bounding volumes

    FineRecon: Depth-aware Feed-forward Network for Detailed 3D Reconstruction

    Full text link
    Recent works on 3D reconstruction from posed images have demonstrated that direct inference of scene-level 3D geometry without test-time optimization is feasible using deep neural networks, showing remarkable promise and high efficiency. However, the reconstructed geometry, typically represented as a 3D truncated signed distance function (TSDF), is often coarse without fine geometric details. To address this problem, we propose three effective solutions for improving the fidelity of inference-based 3D reconstructions. We first present a resolution-agnostic TSDF supervision strategy to provide the network with a more accurate learning signal during training, avoiding the pitfalls of TSDF interpolation seen in previous work. We then introduce a depth guidance strategy using multi-view depth estimates to enhance the scene representation and recover more accurate surfaces. Finally, we develop a novel architecture for the final layers of the network, conditioning the output TSDF prediction on high-resolution image features in addition to coarse voxel features, enabling sharper reconstruction of fine details. Our method, FineRecon, produces smooth and highly accurate reconstructions, showing significant improvements across multiple depth and 3D reconstruction metrics.Comment: ICCV 202

    PixelHuman: Animatable Neural Radiance Fields from Few Images

    Full text link
    In this paper, we propose PixelHuman, a novel human rendering model that generates animatable human scenes from a few images of a person with unseen identity, views, and poses. Previous work have demonstrated reasonable performance in novel view and pose synthesis, but they rely on a large number of images to train and are trained per scene from videos, which requires significant amount of time to produce animatable scenes from unseen human images. Our method differs from existing methods in that it can generalize to any input image for animatable human synthesis. Given a random pose sequence, our method synthesizes each target scene using a neural radiance field that is conditioned on a canonical representation and pose-aware pixel-aligned features, both of which can be obtained through deformation fields learned in a data-driven manner. Our experiments show that our method achieves state-of-the-art performance in multiview and novel pose synthesis from few-shot images.Comment: 8 page
    • …
    corecore