8,041 research outputs found
Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling
We study 3D shape modeling from a single image and make contributions to it
in three aspects. First, we present Pix3D, a large-scale benchmark of diverse
image-shape pairs with pixel-level 2D-3D alignment. Pix3D has wide applications
in shape-related tasks including reconstruction, retrieval, viewpoint
estimation, etc. Building such a large-scale dataset, however, is highly
challenging; existing datasets either contain only synthetic data, or lack
precise alignment between 2D images and 3D shapes, or only have a small number
of images. Second, we calibrate the evaluation criteria for 3D shape
reconstruction through behavioral studies, and use them to objectively and
systematically benchmark cutting-edge reconstruction algorithms on Pix3D.
Third, we design a novel model that simultaneously performs 3D reconstruction
and pose estimation; our multi-task learning approach achieves state-of-the-art
performance on both tasks.Comment: CVPR 2018. The first two authors contributed equally to this work.
Project page: http://pix3d.csail.mit.ed
Multi-view Convolutional Neural Networks for 3D Shape Recognition
A longstanding question in computer vision concerns the representation of 3D
shapes for recognition: should 3D shapes be represented with descriptors
operating on their native 3D formats, such as voxel grid or polygon mesh, or
can they be effectively represented with view-based descriptors? We address
this question in the context of learning to recognize 3D shapes from a
collection of their rendered views on 2D images. We first present a standard
CNN architecture trained to recognize the shapes' rendered views independently
of each other, and show that a 3D shape can be recognized even from a single
view at an accuracy far higher than using state-of-the-art 3D shape
descriptors. Recognition rates further increase when multiple views of the
shapes are provided. In addition, we present a novel CNN architecture that
combines information from multiple views of a 3D shape into a single and
compact shape descriptor offering even better recognition performance. The same
architecture can be applied to accurately recognize human hand-drawn sketches
of shapes. We conclude that a collection of 2D views can be highly informative
for 3D shape recognition and is amenable to emerging CNN architectures and
their derivatives.Comment: v1: Initial version. v2: An updated ModelNet40 training/test split is
used; results with low-rank Mahalanobis metric learning are added. v3 (ICCV
2015): A second camera setup without the upright orientation assumption is
added; some accuracy and mAP numbers are changed slightly because a small
issue in mesh rendering related to specularities is fixe
Deep Sketch Hashing: Fast Free-hand Sketch-Based Image Retrieval
Free-hand sketch-based image retrieval (SBIR) is a specific cross-view
retrieval task, in which queries are abstract and ambiguous sketches while the
retrieval database is formed with natural images. Work in this area mainly
focuses on extracting representative and shared features for sketches and
natural images. However, these can neither cope well with the geometric
distortion between sketches and images nor be feasible for large-scale SBIR due
to the heavy continuous-valued distance computation. In this paper, we speed up
SBIR by introducing a novel binary coding method, named \textbf{Deep Sketch
Hashing} (DSH), where a semi-heterogeneous deep architecture is proposed and
incorporated into an end-to-end binary coding framework. Specifically, three
convolutional neural networks are utilized to encode free-hand sketches,
natural images and, especially, the auxiliary sketch-tokens which are adopted
as bridges to mitigate the sketch-image geometric distortion. The learned DSH
codes can effectively capture the cross-view similarities as well as the
intrinsic semantic correlations between different categories. To the best of
our knowledge, DSH is the first hashing work specifically designed for
category-level SBIR with an end-to-end deep architecture. The proposed DSH is
comprehensively evaluated on two large-scale datasets of TU-Berlin Extension
and Sketchy, and the experiments consistently show DSH's superior SBIR
accuracies over several state-of-the-art methods, while achieving significantly
reduced retrieval time and memory footprint.Comment: This paper will appear as a spotlight paper in CVPR201
Fine-grained sketch-based image retrieval by matching deformable part models
(c) 2014. The copyright of this document resides with its authors.
It may be distributed unchanged freely in print or electronic forms.© 2014. The copyright of this document resides with its authors. An important characteristic of sketches, compared with text, rests with their ability to intrinsically capture object appearance and structure. Nonetheless, akin to traditional text-based image retrieval, conventional sketch-based image retrieval (SBIR) principally focuses on retrieving images of the same category, neglecting the fine-grained characteristics of sketches. In this paper, we advocate the expressiveness of sketches and examine their efficacy under a novel fine-grained SBIR framework. In particular, we study how sketches enable fine-grained retrieval within object categories. Key to this problem is introducing a mid-level sketch representation that not only captures object pose, but also possesses the ability to traverse sketch and image domains. Specifically, we learn deformable part-based model (DPM) as a mid-level representation to discover and encode the various poses in sketch and image domains independently, after which graph matching is performed on DPMs to establish pose correspondences across the two domains. We further propose an SBIR dataset that covers the unique aspects of fine-grained SBIR. Through in-depth experiments, we demonstrate the superior performance of our SBIR framework, and showcase its unique ability in fine-grained retrieval
- …