99,807 research outputs found
Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling
We study 3D shape modeling from a single image and make contributions to it
in three aspects. First, we present Pix3D, a large-scale benchmark of diverse
image-shape pairs with pixel-level 2D-3D alignment. Pix3D has wide applications
in shape-related tasks including reconstruction, retrieval, viewpoint
estimation, etc. Building such a large-scale dataset, however, is highly
challenging; existing datasets either contain only synthetic data, or lack
precise alignment between 2D images and 3D shapes, or only have a small number
of images. Second, we calibrate the evaluation criteria for 3D shape
reconstruction through behavioral studies, and use them to objectively and
systematically benchmark cutting-edge reconstruction algorithms on Pix3D.
Third, we design a novel model that simultaneously performs 3D reconstruction
and pose estimation; our multi-task learning approach achieves state-of-the-art
performance on both tasks.Comment: CVPR 2018. The first two authors contributed equally to this work.
Project page: http://pix3d.csail.mit.ed
Do-It-Yourself Single Camera 3D Pointer Input Device
We present a new algorithm for single camera 3D reconstruction, or 3D input
for human-computer interfaces, based on precise tracking of an elongated
object, such as a pen, having a pattern of colored bands. To configure the
system, the user provides no more than one labelled image of a handmade
pointer, measurements of its colored bands, and the camera's pinhole projection
matrix. Other systems are of much higher cost and complexity, requiring
combinations of multiple cameras, stereocameras, and pointers with sensors and
lights. Instead of relying on information from multiple devices, we examine our
single view more closely, integrating geometric and appearance constraints to
robustly track the pointer in the presence of occlusion and distractor objects.
By probing objects of known geometry with the pointer, we demonstrate
acceptable accuracy of 3D localization.Comment: 8 pages, 6 figures, 2018 15th Conference on Computer and Robot Visio
Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling
We study 3D shape modeling from a single image and make contributions to it
in three aspects. First, we present Pix3D, a large-scale benchmark of diverse
image-shape pairs with pixel-level 2D-3D alignment. Pix3D has wide applications
in shape-related tasks including reconstruction, retrieval, viewpoint
estimation, etc. Building such a large-scale dataset, however, is highly
challenging; existing datasets either contain only synthetic data, or lack
precise alignment between 2D images and 3D shapes, or only have a small number
of images. Second, we calibrate the evaluation criteria for 3D shape
reconstruction through behavioral studies, and use them to objectively and
systematically benchmark cutting-edge reconstruction algorithms on Pix3D.
Third, we design a novel model that simultaneously performs 3D reconstruction
and pose estimation; our multi-task learning approach achieves state-of-the-art
performance on both tasks.Comment: CVPR 2018. The first two authors contributed equally to this work.
Project page: http://pix3d.csail.mit.ed
Dynamic Hyperbolic Attention Network for Fine Hand-object Reconstruction
Reconstructing both objects and hands in 3D from a single RGB image is
complex. Existing methods rely on manually defined hand-object constraints in
Euclidean space, leading to suboptimal feature learning. Compared with
Euclidean space, hyperbolic space better preserves the geometric properties of
meshes thanks to its exponentially-growing space distance, which amplifies the
differences between the features based on similarity. In this work, we propose
the first precise hand-object reconstruction method in hyperbolic space, namely
Dynamic Hyperbolic Attention Network (DHANet), which leverages intrinsic
properties of hyperbolic space to learn representative features. Our method
that projects mesh and image features into a unified hyperbolic space includes
two modules, ie. dynamic hyperbolic graph convolution and image-attention
hyperbolic graph convolution. With these two modules, our method learns mesh
features with rich geometry-image multi-modal information and models better
hand-object interaction. Our method provides a promising alternative for fine
hand-object reconstruction in hyperbolic space. Extensive experiments on three
public datasets demonstrate that our method outperforms most state-of-the-art
methods.Comment: Accpeted by ICCV 202
FlowCam: Training Generalizable 3D Radiance Fields without Camera Poses via Pixel-Aligned Scene Flow
Reconstruction of 3D neural fields from posed images has emerged as a
promising method for self-supervised representation learning. The key challenge
preventing the deployment of these 3D scene learners on large-scale video data
is their dependence on precise camera poses from structure-from-motion, which
is prohibitively expensive to run at scale. We propose a method that jointly
reconstructs camera poses and 3D neural scene representations online and in a
single forward pass. We estimate poses by first lifting frame-to-frame optical
flow to 3D scene flow via differentiable rendering, preserving locality and
shift-equivariance of the image processing backbone. SE(3) camera pose
estimation is then performed via a weighted least-squares fit to the scene flow
field. This formulation enables us to jointly supervise pose estimation and a
generalizable neural scene representation via re-rendering the input video, and
thus, train end-to-end and fully self-supervised on real-world video datasets.
We demonstrate that our method performs robustly on diverse, real-world video,
notably on sequences traditionally challenging to optimization-based pose
estimation techniques.Comment: Project website: http://cameronosmith.github.io/flowca
Probabilistic Global Scale Estimation for MonoSLAM Based on Generic Object Detection
This paper proposes a novel method to estimate the global scale of a 3D
reconstructed model within a Kalman filtering-based monocular SLAM algorithm.
Our Bayesian framework integrates height priors over the detected objects
belonging to a set of broad predefined classes, based on recent advances in
fast generic object detection. Each observation is produced on single frames,
so that we do not need a data association process along video frames. This is
because we associate the height priors with the image region sizes at image
places where map features projections fall within the object detection regions.
We present very promising results of this approach obtained on several
experiments with different object classes.Comment: Int. Workshop on Visual Odometry, CVPR, (July 2017
- …