2,826 research outputs found

    Semantic 3D Reconstruction with Continuous Regularization and Ray Potentials Using a Visibility Consistency Constraint

    Full text link
    We propose an approach for dense semantic 3D reconstruction which uses a data term that is defined as potentials over viewing rays, combined with continuous surface area penalization. Our formulation is a convex relaxation which we augment with a crucial non-convex constraint that ensures exact handling of visibility. To tackle the non-convex minimization problem, we propose a majorize-minimize type strategy which converges to a critical point. We demonstrate the benefits of using the non-convex constraint experimentally. For the geometry-only case, we set a new state of the art on two datasets of the commonly used Middlebury multi-view stereo benchmark. Moreover, our general-purpose formulation directly reconstructs thin objects, which are usually treated with specialized algorithms. A qualitative evaluation on the dense semantic 3D reconstruction task shows that we improve significantly over previous methods.Comment: Accepted as a spotlight oral paper by CVPR 2016. Code at https://github.com/nsavinov/ray_potentials

    Learning a Multi-View Stereo Machine

    Full text link
    We present a learnt system for multi-view stereopsis. In contrast to recent learning based methods for 3D reconstruction, we leverage the underlying 3D geometry of the problem through feature projection and unprojection along viewing rays. By formulating these operations in a differentiable manner, we are able to learn the system end-to-end for the task of metric 3D reconstruction. End-to-end learning allows us to jointly reason about shape priors while conforming geometric constraints, enabling reconstruction from much fewer images (even a single image) than required by classical approaches as well as completion of unseen surfaces. We thoroughly evaluate our approach on the ShapeNet dataset and demonstrate the benefits over classical approaches as well as recent learning based methods

    3D Shape Reconstruction from a Single 2D Image via 2D-3D Self-Consistency

    Full text link
    Aiming at inferring 3D shapes from 2D images, 3D shape reconstruction has drawn huge attention from researchers in computer vision and deep learning communities. However, it is not practical to assume that 2D input images and their associated ground truth 3D shapes are always available during training. In this paper, we propose a framework for semi-supervised 3D reconstruction. This is realized by our introduced 2D-3D self-consistency, which aligns the predicted 3D models and the projected 2D foreground segmentation masks. Moreover, our model not only enables recovering 3D shapes with the corresponding 2D masks, camera pose information can be jointly disentangled and predicted, even such supervision is never available during training. In the experiments, we qualitatively and quantitatively demonstrate the effectiveness of our model, which performs favorably against state-of-the-art approaches in either supervised or semi-supervised settings

    Multi-view Supervision for Single-view Reconstruction via Differentiable Ray Consistency

    Full text link
    We study the notion of consistency between a 3D shape and a 2D observation and propose a differentiable formulation which allows computing gradients of the 3D shape given an observation from an arbitrary view. We do so by reformulating view consistency using a differentiable ray consistency (DRC) term. We show that this formulation can be incorporated in a learning framework to leverage different types of multi-view observations e.g. foreground masks, depth, color images, semantics etc. as supervision for learning single-view 3D prediction. We present empirical analysis of our technique in a controlled setting. We also show that this approach allows us to improve over existing techniques for single-view reconstruction of objects from the PASCAL VOC dataset.Comment: To appear at CVPR 2017. Project webpage : https://shubhtuls.github.io/drc

    RayNet: Learning Volumetric 3D Reconstruction with Ray Potentials

    Full text link
    In this paper, we consider the problem of reconstructing a dense 3D model using images captured from different views. Recent methods based on convolutional neural networks (CNN) allow learning the entire task from data. However, they do not incorporate the physics of image formation such as perspective geometry and occlusion. Instead, classical approaches based on Markov Random Fields (MRF) with ray-potentials explicitly model these physical processes, but they cannot cope with large surface appearance variations across different viewpoints. In this paper, we propose RayNet, which combines the strengths of both frameworks. RayNet integrates a CNN that learns view-invariant feature representations with an MRF that explicitly encodes the physics of perspective projection and occlusion. We train RayNet end-to-end using empirical risk minimization. We thoroughly evaluate our approach on challenging real-world datasets and demonstrate its benefits over a piece-wise trained baseline, hand-crafted models as well as other learning-based approaches.Comment: Accepted to CVPR 2018 as spotlight. Project url with code: http://raynet-mvs.com

    Dense 3D Point Cloud Reconstruction Using a Deep Pyramid Network

    Full text link
    Reconstructing a high-resolution 3D model of an object is a challenging task in computer vision. Designing scalable and light-weight architectures is crucial while addressing this problem. Existing point-cloud based reconstruction approaches directly predict the entire point cloud in a single stage. Although this technique can handle low-resolution point clouds, it is not a viable solution for generating dense, high-resolution outputs. In this work, we introduce DensePCR, a deep pyramidal network for point cloud reconstruction that hierarchically predicts point clouds of increasing resolution. Towards this end, we propose an architecture that first predicts a low-resolution point cloud, and then hierarchically increases the resolution by aggregating local and global point features to deform a grid. Our method generates point clouds that are accurate, uniform and dense. Through extensive quantitative and qualitative evaluation on synthetic and real datasets, we demonstrate that DensePCR outperforms the existing state-of-the-art point cloud reconstruction works, while also providing a light-weight and scalable architecture for predicting high-resolution outputs.Comment: WACV 201

    Accelerated Inference in Markov Random Fields via Smooth Riemannian Optimization

    Full text link
    Markov Random Fields (MRFs) are a popular model for several pattern recognition and reconstruction problems in robotics and computer vision. Inference in MRFs is intractable in general and related work resorts to approximation algorithms. Among those techniques, semidefinite programming (SDP) relaxations have been shown to provide accurate estimates while scaling poorly with the problem size and being typically slow for practical applications. Our first contribution is to design a dual ascent method to solve standard SDP relaxations that takes advantage of the geometric structure of the problem to speed up computation. This technique, named Dual Ascent Riemannian Staircase (DARS), is able to solve large problem instances in seconds. Our second contribution is to develop a second and faster approach. The backbone of this second approach is a novel SDP relaxation combined with a fast and scalable solver based on smooth Riemannian optimization. We show that this approach, named Fast Unconstrained SEmidefinite Solver (FUSES), can solve large problems in milliseconds. Contrarily to local MRF solvers, e.g., loopy belief propagation, our approaches do not require an initial guess. Moreover, we leverage recent results from optimization theory to provide per-instance sub-optimality guarantees. We demonstrate the proposed approaches in multi-class image segmentation problems. Extensive experimental evidence shows that (i) FUSES and DARS produce near-optimal solutions, attaining an objective within 0.1% of the optimum, (ii) FUSES and DARS are remarkably faster than general-purpose SDP solvers, and FUSES is more than two orders of magnitude faster than DARS while attaining similar solution quality, (iii) FUSES is faster than local search methods while being a global solver.Comment: 16 page

    Neural Volumes: Learning Dynamic Renderable Volumes from Images

    Full text link
    Modeling and rendering of dynamic scenes is challenging, as natural scenes often contain complex phenomena such as thin structures, evolving topology, translucency, scattering, occlusion, and biological motion. Mesh-based reconstruction and tracking often fail in these cases, and other approaches (e.g., light field video) typically rely on constrained viewing conditions, which limit interactivity. We circumvent these difficulties by presenting a learning-based approach to representing dynamic objects inspired by the integral projection model used in tomographic imaging. The approach is supervised directly from 2D images in a multi-view capture setting and does not require explicit reconstruction or tracking of the object. Our method has two primary components: an encoder-decoder network that transforms input images into a 3D volume representation, and a differentiable ray-marching operation that enables end-to-end training. By virtue of its 3D representation, our construction extrapolates better to novel viewpoints compared to screen-space rendering techniques. The encoder-decoder architecture learns a latent representation of a dynamic scene that enables us to produce novel content sequences not seen during training. To overcome memory limitations of voxel-based representations, we learn a dynamic irregular grid structure implemented with a warp field during ray-marching. This structure greatly improves the apparent resolution and reduces grid-like artifacts and jagged motion. Finally, we demonstrate how to incorporate surface-based representations into our volumetric-learning framework for applications where the highest resolution is required, using facial performance capture as a case in point.Comment: Accepted to SIGGRAPH 201

    Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction

    Full text link
    Dense semantic 3D reconstruction is typically formulated as a discrete or continuous problem over label assignments in a voxel grid, combining semantic and depth likelihoods in a Markov Random Field framework. The depth and semantic information is incorporated as a unary potential, smoothed by a pairwise regularizer. However, modelling likelihoods as a unary potential does not model the problem correctly leading to various undesirable visibility artifacts. We propose to formulate an optimization problem that directly optimizes the reprojection error of the 3D model with respect to the image estimates, which corresponds to the optimization over rays, where the cost function depends on the semantic class and depth of the first occupied voxel along the ray. The 2-label formulation is made feasible by transforming it into a graph-representable form under QPBO relaxation, solvable using graph cut. The multi-label problem is solved by applying alpha-expansion using the same relaxation in each expansion move. Our method was indeed shown to be feasible in practice, running comparably fast to the competing methods, while not suffering from ray potential approximation artifacts.Comment: Published at CVPR 201

    PSDF Fusion: Probabilistic Signed Distance Function for On-the-fly 3D Data Fusion and Scene Reconstruction

    Full text link
    We propose a novel 3D spatial representation for data fusion and scene reconstruction. Probabilistic Signed Distance Function (Probabilistic SDF, PSDF) is proposed to depict uncertainties in the 3D space. It is modeled by a joint distribution describing SDF value and its inlier probability, reflecting input data quality and surface geometry. A hybrid data structure involving voxel, surfel, and mesh is designed to fully exploit the advantages of various prevalent 3D representations. Connected by PSDF, these components reasonably cooperate in a consistent frame- work. Given sequential depth measurements, PSDF can be incrementally refined with less ad hoc parametric Bayesian updating. Supported by PSDF and the efficient 3D data representation, high-quality surfaces can be extracted on-the-fly, and in return contribute to reliable data fu- sion using the geometry information. Experiments demonstrate that our system reconstructs scenes with higher model quality and lower redundancy, and runs faster than existing online mesh generation systems.Comment: Accepted to ECCV 201
    • …
    corecore