123 research outputs found
SilNet : Single- and Multi-View Reconstruction by Learning from Silhouettes
The objective of this paper is 3D shape understanding from single and
multiple images. To this end, we introduce a new deep-learning architecture and
loss function, SilNet, that can handle multiple views in an order-agnostic
manner. The architecture is fully convolutional, and for training we use a
proxy task of silhouette prediction, rather than directly learning a mapping
from 2D images to 3D shape as has been the target in most recent work.
We demonstrate that with the SilNet architecture there is generalisation over
the number of views -- for example, SilNet trained on 2 views can be used with
3 or 4 views at test-time; and performance improves with more views.
We introduce two new synthetics datasets: a blobby object dataset useful for
pre-training, and a challenging and realistic sculpture dataset; and
demonstrate on these datasets that SilNet has indeed learnt 3D shape. Finally,
we show that SilNet exceeds the state of the art on the ShapeNet benchmark
dataset, and use SilNet to generate novel views of the sculpture dataset.Comment: BMVC 2017; Best Poste
Pix2Vox: Context-aware 3D Reconstruction from Single and Multi-view Images
Recovering the 3D representation of an object from single-view or multi-view
RGB images by deep neural networks has attracted increasing attention in the
past few years. Several mainstream works (e.g., 3D-R2N2) use recurrent neural
networks (RNNs) to fuse multiple feature maps extracted from input images
sequentially. However, when given the same set of input images with different
orders, RNN-based approaches are unable to produce consistent reconstruction
results. Moreover, due to long-term memory loss, RNNs cannot fully exploit
input images to refine reconstruction results. To solve these problems, we
propose a novel framework for single-view and multi-view 3D reconstruction,
named Pix2Vox. By using a well-designed encoder-decoder, it generates a coarse
3D volume from each input image. Then, a context-aware fusion module is
introduced to adaptively select high-quality reconstructions for each part
(e.g., table legs) from different coarse 3D volumes to obtain a fused 3D
volume. Finally, a refiner further refines the fused 3D volume to generate the
final output. Experimental results on the ShapeNet and Pix3D benchmarks
indicate that the proposed Pix2Vox outperforms state-of-the-arts by a large
margin. Furthermore, the proposed method is 24 times faster than 3D-R2N2 in
terms of backward inference time. The experiments on ShapeNet unseen 3D
categories have shown the superior generalization abilities of our method.Comment: ICCV 201
- …