2,826 research outputs found
Semantic 3D Reconstruction with Continuous Regularization and Ray Potentials Using a Visibility Consistency Constraint
We propose an approach for dense semantic 3D reconstruction which uses a data
term that is defined as potentials over viewing rays, combined with continuous
surface area penalization. Our formulation is a convex relaxation which we
augment with a crucial non-convex constraint that ensures exact handling of
visibility. To tackle the non-convex minimization problem, we propose a
majorize-minimize type strategy which converges to a critical point. We
demonstrate the benefits of using the non-convex constraint experimentally. For
the geometry-only case, we set a new state of the art on two datasets of the
commonly used Middlebury multi-view stereo benchmark. Moreover, our
general-purpose formulation directly reconstructs thin objects, which are
usually treated with specialized algorithms. A qualitative evaluation on the
dense semantic 3D reconstruction task shows that we improve significantly over
previous methods.Comment: Accepted as a spotlight oral paper by CVPR 2016. Code at
https://github.com/nsavinov/ray_potentials
Learning a Multi-View Stereo Machine
We present a learnt system for multi-view stereopsis. In contrast to recent
learning based methods for 3D reconstruction, we leverage the underlying 3D
geometry of the problem through feature projection and unprojection along
viewing rays. By formulating these operations in a differentiable manner, we
are able to learn the system end-to-end for the task of metric 3D
reconstruction. End-to-end learning allows us to jointly reason about shape
priors while conforming geometric constraints, enabling reconstruction from
much fewer images (even a single image) than required by classical approaches
as well as completion of unseen surfaces. We thoroughly evaluate our approach
on the ShapeNet dataset and demonstrate the benefits over classical approaches
as well as recent learning based methods
3D Shape Reconstruction from a Single 2D Image via 2D-3D Self-Consistency
Aiming at inferring 3D shapes from 2D images, 3D shape reconstruction has
drawn huge attention from researchers in computer vision and deep learning
communities. However, it is not practical to assume that 2D input images and
their associated ground truth 3D shapes are always available during training.
In this paper, we propose a framework for semi-supervised 3D reconstruction.
This is realized by our introduced 2D-3D self-consistency, which aligns the
predicted 3D models and the projected 2D foreground segmentation masks.
Moreover, our model not only enables recovering 3D shapes with the
corresponding 2D masks, camera pose information can be jointly disentangled and
predicted, even such supervision is never available during training. In the
experiments, we qualitatively and quantitatively demonstrate the effectiveness
of our model, which performs favorably against state-of-the-art approaches in
either supervised or semi-supervised settings
Multi-view Supervision for Single-view Reconstruction via Differentiable Ray Consistency
We study the notion of consistency between a 3D shape and a 2D observation
and propose a differentiable formulation which allows computing gradients of
the 3D shape given an observation from an arbitrary view. We do so by
reformulating view consistency using a differentiable ray consistency (DRC)
term. We show that this formulation can be incorporated in a learning framework
to leverage different types of multi-view observations e.g. foreground masks,
depth, color images, semantics etc. as supervision for learning single-view 3D
prediction. We present empirical analysis of our technique in a controlled
setting. We also show that this approach allows us to improve over existing
techniques for single-view reconstruction of objects from the PASCAL VOC
dataset.Comment: To appear at CVPR 2017. Project webpage :
https://shubhtuls.github.io/drc
RayNet: Learning Volumetric 3D Reconstruction with Ray Potentials
In this paper, we consider the problem of reconstructing a dense 3D model
using images captured from different views. Recent methods based on
convolutional neural networks (CNN) allow learning the entire task from data.
However, they do not incorporate the physics of image formation such as
perspective geometry and occlusion. Instead, classical approaches based on
Markov Random Fields (MRF) with ray-potentials explicitly model these physical
processes, but they cannot cope with large surface appearance variations across
different viewpoints. In this paper, we propose RayNet, which combines the
strengths of both frameworks. RayNet integrates a CNN that learns
view-invariant feature representations with an MRF that explicitly encodes the
physics of perspective projection and occlusion. We train RayNet end-to-end
using empirical risk minimization. We thoroughly evaluate our approach on
challenging real-world datasets and demonstrate its benefits over a piece-wise
trained baseline, hand-crafted models as well as other learning-based
approaches.Comment: Accepted to CVPR 2018 as spotlight. Project url with code:
http://raynet-mvs.com
Dense 3D Point Cloud Reconstruction Using a Deep Pyramid Network
Reconstructing a high-resolution 3D model of an object is a challenging task
in computer vision. Designing scalable and light-weight architectures is
crucial while addressing this problem. Existing point-cloud based
reconstruction approaches directly predict the entire point cloud in a single
stage. Although this technique can handle low-resolution point clouds, it is
not a viable solution for generating dense, high-resolution outputs. In this
work, we introduce DensePCR, a deep pyramidal network for point cloud
reconstruction that hierarchically predicts point clouds of increasing
resolution. Towards this end, we propose an architecture that first predicts a
low-resolution point cloud, and then hierarchically increases the resolution by
aggregating local and global point features to deform a grid. Our method
generates point clouds that are accurate, uniform and dense. Through extensive
quantitative and qualitative evaluation on synthetic and real datasets, we
demonstrate that DensePCR outperforms the existing state-of-the-art point cloud
reconstruction works, while also providing a light-weight and scalable
architecture for predicting high-resolution outputs.Comment: WACV 201
Accelerated Inference in Markov Random Fields via Smooth Riemannian Optimization
Markov Random Fields (MRFs) are a popular model for several pattern
recognition and reconstruction problems in robotics and computer vision.
Inference in MRFs is intractable in general and related work resorts to
approximation algorithms. Among those techniques, semidefinite programming
(SDP) relaxations have been shown to provide accurate estimates while scaling
poorly with the problem size and being typically slow for practical
applications. Our first contribution is to design a dual ascent method to solve
standard SDP relaxations that takes advantage of the geometric structure of the
problem to speed up computation. This technique, named Dual Ascent Riemannian
Staircase (DARS), is able to solve large problem instances in seconds. Our
second contribution is to develop a second and faster approach. The backbone of
this second approach is a novel SDP relaxation combined with a fast and
scalable solver based on smooth Riemannian optimization. We show that this
approach, named Fast Unconstrained SEmidefinite Solver (FUSES), can solve large
problems in milliseconds. Contrarily to local MRF solvers, e.g., loopy belief
propagation, our approaches do not require an initial guess. Moreover, we
leverage recent results from optimization theory to provide per-instance
sub-optimality guarantees. We demonstrate the proposed approaches in
multi-class image segmentation problems. Extensive experimental evidence shows
that (i) FUSES and DARS produce near-optimal solutions, attaining an objective
within 0.1% of the optimum, (ii) FUSES and DARS are remarkably faster than
general-purpose SDP solvers, and FUSES is more than two orders of magnitude
faster than DARS while attaining similar solution quality, (iii) FUSES is
faster than local search methods while being a global solver.Comment: 16 page
Neural Volumes: Learning Dynamic Renderable Volumes from Images
Modeling and rendering of dynamic scenes is challenging, as natural scenes
often contain complex phenomena such as thin structures, evolving topology,
translucency, scattering, occlusion, and biological motion. Mesh-based
reconstruction and tracking often fail in these cases, and other approaches
(e.g., light field video) typically rely on constrained viewing conditions,
which limit interactivity. We circumvent these difficulties by presenting a
learning-based approach to representing dynamic objects inspired by the
integral projection model used in tomographic imaging. The approach is
supervised directly from 2D images in a multi-view capture setting and does not
require explicit reconstruction or tracking of the object. Our method has two
primary components: an encoder-decoder network that transforms input images
into a 3D volume representation, and a differentiable ray-marching operation
that enables end-to-end training. By virtue of its 3D representation, our
construction extrapolates better to novel viewpoints compared to screen-space
rendering techniques. The encoder-decoder architecture learns a latent
representation of a dynamic scene that enables us to produce novel content
sequences not seen during training. To overcome memory limitations of
voxel-based representations, we learn a dynamic irregular grid structure
implemented with a warp field during ray-marching. This structure greatly
improves the apparent resolution and reduces grid-like artifacts and jagged
motion. Finally, we demonstrate how to incorporate surface-based
representations into our volumetric-learning framework for applications where
the highest resolution is required, using facial performance capture as a case
in point.Comment: Accepted to SIGGRAPH 201
Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction
Dense semantic 3D reconstruction is typically formulated as a discrete or
continuous problem over label assignments in a voxel grid, combining semantic
and depth likelihoods in a Markov Random Field framework. The depth and
semantic information is incorporated as a unary potential, smoothed by a
pairwise regularizer. However, modelling likelihoods as a unary potential does
not model the problem correctly leading to various undesirable visibility
artifacts.
We propose to formulate an optimization problem that directly optimizes the
reprojection error of the 3D model with respect to the image estimates, which
corresponds to the optimization over rays, where the cost function depends on
the semantic class and depth of the first occupied voxel along the ray. The
2-label formulation is made feasible by transforming it into a
graph-representable form under QPBO relaxation, solvable using graph cut. The
multi-label problem is solved by applying alpha-expansion using the same
relaxation in each expansion move. Our method was indeed shown to be feasible
in practice, running comparably fast to the competing methods, while not
suffering from ray potential approximation artifacts.Comment: Published at CVPR 201
PSDF Fusion: Probabilistic Signed Distance Function for On-the-fly 3D Data Fusion and Scene Reconstruction
We propose a novel 3D spatial representation for data fusion and scene
reconstruction. Probabilistic Signed Distance Function (Probabilistic SDF,
PSDF) is proposed to depict uncertainties in the 3D space. It is modeled by a
joint distribution describing SDF value and its inlier probability, reflecting
input data quality and surface geometry. A hybrid data structure involving
voxel, surfel, and mesh is designed to fully exploit the advantages of various
prevalent 3D representations. Connected by PSDF, these components reasonably
cooperate in a consistent frame- work. Given sequential depth measurements,
PSDF can be incrementally refined with less ad hoc parametric Bayesian
updating. Supported by PSDF and the efficient 3D data representation,
high-quality surfaces can be extracted on-the-fly, and in return contribute to
reliable data fu- sion using the geometry information. Experiments demonstrate
that our system reconstructs scenes with higher model quality and lower
redundancy, and runs faster than existing online mesh generation systems.Comment: Accepted to ECCV 201
- …