10,189 research outputs found
Matterport3D: Learning from RGB-D Data in Indoor Environments
Access to large, diverse RGB-D datasets is critical for training RGB-D scene
understanding algorithms. However, existing datasets still cover only a limited
number of views or a restricted scale of spaces. In this paper, we introduce
Matterport3D, a large-scale RGB-D dataset containing 10,800 panoramic views
from 194,400 RGB-D images of 90 building-scale scenes. Annotations are provided
with surface reconstructions, camera poses, and 2D and 3D semantic
segmentations. The precise global alignment and comprehensive, diverse
panoramic set of views over entire buildings enable a variety of supervised and
self-supervised computer vision tasks, including keypoint matching, view
overlap prediction, normal prediction from color, semantic segmentation, and
region classification
Learning to Reconstruct Texture-less Deformable Surfaces from a Single View
Recent years have seen the development of mature solutions for reconstructing
deformable surfaces from a single image, provided that they are relatively
well-textured. By contrast, recovering the 3D shape of texture-less surfaces
remains an open problem, and essentially relates to Shape-from-Shading. In this
paper, we introduce a data-driven approach to this problem. We introduce a
general framework that can predict diverse 3D representations, such as meshes,
normals, and depth maps. Our experiments show that meshes are ill-suited to
handle texture-less 3D reconstruction in our context. Furthermore, we
demonstrate that our approach generalizes well to unseen objects, and that it
yields higher-quality reconstructions than a state-of-the-art SfS technique,
particularly in terms of normal estimates. Our reconstructions accurately model
the fine details of the surfaces, such as the creases of a T-Shirt worn by a
person.Comment: Accepted to 3DV 201
From Stereogram to Surface: How the Brain Sees the World in Depth
When we look at a scene, how do we consciously see surfaces infused with lightness and color at the correct depths? Random Dot Stereograms (RDS) probe how binocular disparity between the two eyes can generate such conscious surface percepts. Dense RDS do so despite the fact that they include multiple false binocular matches. Sparse stereograms do so even across large contrast-free regions with no binocular matches. Stereograms that define occluding and occluded surfaces lead to surface percepts wherein partially occluded textured surfaces are completed behind occluding textured surfaces at a spatial scale much larger than that of the texture elements themselves. Earlier models suggest how the brain detects binocular disparity, but not how RDS generate conscious percepts of 3D surfaces. A neural model predicts how the layered circuits of visual cortex generate these 3D surface percepts using interactions between visual boundary and surface representations that obey complementary computational rules.Air Force Office of Scientific Research (F49620-01-1-0397); National Science Foundation (EIA-01-30851, SBE-0354378); Office of Naval Research (N00014-01-1-0624
Evaluation of CNN-based Single-Image Depth Estimation Methods
While an increasing interest in deep models for single-image depth estimation
methods can be observed, established schemes for their evaluation are still
limited. We propose a set of novel quality criteria, allowing for a more
detailed analysis by focusing on specific characteristics of depth maps. In
particular, we address the preservation of edges and planar regions, depth
consistency, and absolute distance accuracy. In order to employ these metrics
to evaluate and compare state-of-the-art single-image depth estimation
approaches, we provide a new high-quality RGB-D dataset. We used a DSLR camera
together with a laser scanner to acquire high-resolution images and highly
accurate depth maps. Experimental results show the validity of our proposed
evaluation protocol
Multi-frame scene-flow estimation using a patch model and smooth motion prior
This paper addresses the problem of estimating the dense 3D motion of a scene over several frames using a set of calibrated cameras. Most current 3D motion estimation techniques are limited to estimating the motion over a single frame, unless a strong prior model of the scene (such as a skeleton) is introduced. Estimating the 3D motion of a general scene is difficult due to untextured surfaces, complex movements and occlusions. In this paper, we show that it is possible to track the surfaces of a scene over several frames, by introducing an effective prior on the scene motion. Experimental results show that the proposed method estimates the dense scene-flow over multiple frames, without the need for multiple-view reconstructions at every frame. Furthermore, the accuracy of the proposed method is demonstrated by comparing the estimated motion against a ground truth
T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-less Objects
We introduce T-LESS, a new public dataset for estimating the 6D pose, i.e.
translation and rotation, of texture-less rigid objects. The dataset features
thirty industry-relevant objects with no significant texture and no
discriminative color or reflectance properties. The objects exhibit symmetries
and mutual similarities in shape and/or size. Compared to other datasets, a
unique property is that some of the objects are parts of others. The dataset
includes training and test images that were captured with three synchronized
sensors, specifically a structured-light and a time-of-flight RGB-D sensor and
a high-resolution RGB camera. There are approximately 39K training and 10K test
images from each sensor. Additionally, two types of 3D models are provided for
each object, i.e. a manually created CAD model and a semi-automatically
reconstructed one. Training images depict individual objects against a black
background. Test images originate from twenty test scenes having varying
complexity, which increases from simple scenes with several isolated objects to
very challenging ones with multiple instances of several objects and with a
high amount of clutter and occlusion. The images were captured from a
systematically sampled view sphere around the object/scene, and are annotated
with accurate ground truth 6D poses of all modeled objects. Initial evaluation
results indicate that the state of the art in 6D object pose estimation has
ample room for improvement, especially in difficult cases with significant
occlusion. The T-LESS dataset is available online at cmp.felk.cvut.cz/t-less.Comment: WACV 201
- …