835 research outputs found
Learning to Generate Chairs, Tables and Cars with Convolutional Networks
We train generative 'up-convolutional' neural networks which are able to
generate images of objects given object style, viewpoint, and color. We train
the networks on rendered 3D models of chairs, tables, and cars. Our experiments
show that the networks do not merely learn all images by heart, but rather find
a meaningful representation of 3D models allowing them to assess the similarity
of different models, interpolate between given views to generate the missing
ones, extrapolate views, and invent new objects not present in the training set
by recombining training instances, or even two different object classes.
Moreover, we show that such generative networks can be used to find
correspondences between different objects from the dataset, outperforming
existing approaches on this task.Comment: v4: final PAMI version. New architecture figur
Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis
We introduce a data-driven approach to complete partial 3D shapes through a
combination of volumetric deep neural networks and 3D shape synthesis. From a
partially-scanned input shape, our method first infers a low-resolution -- but
complete -- output. To this end, we introduce a 3D-Encoder-Predictor Network
(3D-EPN) which is composed of 3D convolutional layers. The network is trained
to predict and fill in missing data, and operates on an implicit surface
representation that encodes both known and unknown space. This allows us to
predict global structure in unknown areas at high accuracy. We then correlate
these intermediary results with 3D geometry from a shape database at test time.
In a final pass, we propose a patch-based 3D shape synthesis method that
imposes the 3D geometry from these retrieved shapes as constraints on the
coarsely-completed mesh. This synthesis process enables us to reconstruct
fine-scale detail and generate high-resolution output while respecting the
global mesh structure obtained by the 3D-EPN. Although our 3D-EPN outperforms
state-of-the-art completion method, the main contribution in our work lies in
the combination of a data-driven shape predictor and analytic 3D shape
synthesis. In our results, we show extensive evaluations on a newly-introduced
shape completion benchmark for both real-world and synthetic data
Shape Generation using Spatially Partitioned Point Clouds
We propose a method to generate 3D shapes using point clouds. Given a
point-cloud representation of a 3D shape, our method builds a kd-tree to
spatially partition the points. This orders them consistently across all
shapes, resulting in reasonably good correspondences across all shapes. We then
use PCA analysis to derive a linear shape basis across the spatially
partitioned points, and optimize the point ordering by iteratively minimizing
the PCA reconstruction error. Even with the spatial sorting, the point clouds
are inherently noisy and the resulting distribution over the shape coefficients
can be highly multi-modal. We propose to use the expressive power of neural
networks to learn a distribution over the shape coefficients in a
generative-adversarial framework. Compared to 3D shape generative models
trained on voxel-representations, our point-based method is considerably more
light-weight and scalable, with little loss of quality. It also outperforms
simpler linear factor models such as Probabilistic PCA, both qualitatively and
quantitatively, on a number of categories from the ShapeNet dataset.
Furthermore, our method can easily incorporate other point attributes such as
normal and color information, an additional advantage over voxel-based
representations.Comment: To appear at BMVC 201
Learning Shape Priors for Single-View 3D Completion and Reconstruction
The problem of single-view 3D shape completion or reconstruction is
challenging, because among the many possible shapes that explain an
observation, most are implausible and do not correspond to natural objects.
Recent research in the field has tackled this problem by exploiting the
expressiveness of deep convolutional networks. In fact, there is another level
of ambiguity that is often overlooked: among plausible shapes, there are still
multiple shapes that fit the 2D image equally well; i.e., the ground truth
shape is non-deterministic given a single-view input. Existing fully supervised
approaches fail to address this issue, and often produce blurry mean shapes
with smooth surfaces but no fine details.
In this paper, we propose ShapeHD, pushing the limit of single-view shape
completion and reconstruction by integrating deep generative models with
adversarially learned shape priors. The learned priors serve as a regularizer,
penalizing the model only if its output is unrealistic, not if it deviates from
the ground truth. Our design thus overcomes both levels of ambiguity
aforementioned. Experiments demonstrate that ShapeHD outperforms state of the
art by a large margin in both shape completion and shape reconstruction on
multiple real datasets.Comment: ECCV 2018. The first two authors contributed equally to this work.
Project page: http://shapehd.csail.mit.edu
Cross-View Image Synthesis using Conditional GANs
Learning to generate natural scenes has always been a challenging task in
computer vision. It is even more painstaking when the generation is conditioned
on images with drastically different views. This is mainly because
understanding, corresponding, and transforming appearance and semantic
information across the views is not trivial. In this paper, we attempt to solve
the novel problem of cross-view image synthesis, aerial to street-view and vice
versa, using conditional generative adversarial networks (cGAN). Two new
architectures called Crossview Fork (X-Fork) and Crossview Sequential (X-Seq)
are proposed to generate scenes with resolutions of 64x64 and 256x256 pixels.
X-Fork architecture has a single discriminator and a single generator. The
generator hallucinates both the image and its semantic segmentation in the
target view. X-Seq architecture utilizes two cGANs. The first one generates the
target image which is subsequently fed to the second cGAN for generating its
corresponding semantic segmentation map. The feedback from the second cGAN
helps the first cGAN generate sharper images. Both of our proposed
architectures learn to generate natural images as well as their semantic
segmentation maps. The proposed methods show that they are able to capture and
maintain the true semantics of objects in source and target views better than
the traditional image-to-image translation method which considers only the
visual appearance of the scene. Extensive qualitative and quantitative
evaluations support the effectiveness of our frameworks, compared to two state
of the art methods, for natural scene generation across drastically different
views.Comment: Accepted at CVPR 201
Hierarchical Surface Prediction for 3D Object Reconstruction
Recently, Convolutional Neural Networks have shown promising results for 3D
geometry prediction. They can make predictions from very little input data such
as a single color image. A major limitation of such approaches is that they
only predict a coarse resolution voxel grid, which does not capture the surface
of the objects well. We propose a general framework, called hierarchical
surface prediction (HSP), which facilitates prediction of high resolution voxel
grids. The main insight is that it is sufficient to predict high resolution
voxels around the predicted surfaces. The exterior and interior of the objects
can be represented with coarse resolution voxels. Our approach is not dependent
on a specific input type. We show results for geometry prediction from color
images, depth images and shape completion from partial voxel grids. Our
analysis shows that our high resolution predictions are more accurate than low
resolution predictions.Comment: 3DV 201
- …