21,903 research outputs found
Visual Object Networks: Image Generation with Disentangled 3D Representation
Recent progress in deep generative models has led to tremendous breakthroughs
in image generation. However, while existing models can synthesize
photorealistic images, they lack an understanding of our underlying 3D world.
We present a new generative model, Visual Object Networks (VON), synthesizing
natural images of objects with a disentangled 3D representation. Inspired by
classic graphics rendering pipelines, we unravel our image formation process
into three conditionally independent factors---shape, viewpoint, and
texture---and present an end-to-end adversarial learning framework that jointly
models 3D shapes and 2D images. Our model first learns to synthesize 3D shapes
that are indistinguishable from real shapes. It then renders the object's 2.5D
sketches (i.e., silhouette and depth map) from its shape under a sampled
viewpoint. Finally, it learns to add realistic texture to these 2.5D sketches
to generate natural images. The VON not only generates images that are more
realistic than state-of-the-art 2D image synthesis methods, but also enables
many 3D operations such as changing the viewpoint of a generated image, editing
of shape and texture, linear interpolation in texture and shape space, and
transferring appearance across different objects and viewpoints.Comment: NeurIPS 2018. Code: https://github.com/junyanz/VON Website:
http://von.csail.mit.edu
Recommended from our members
Shape matching and clustering in design
Generalising knowledge and matching patterns is a basic human trait in re-using past experiences. We often cluster (group) knowledge of similar attributes as a process of learning and or aid to manage the complexity and re-use of experiential knowledge [1, 2]. In conceptual design, an ill-defined shape may be recognised as more than one type. Resulting in shapes possibly being classified differently when different criteria are applied. This paper outlines the work being carried out to develop a new technique for shape clustering. It highlights the current methods for analysing shapes found in computer aided sketching systems, before a method is proposed that addresses shape clustering and pattern matching. Clustering for vague geometric models and multiple viewpoint support are explored
Shape matching and clustering
Generalising knowledge and matching patterns is a basic human trait in re-using past experiences. We often cluster (group) knowledge of similar attributes as a process of learning and or aid to manage the complexity and re-use of experiential knowledge [1, 2]. In conceptual design, an ill-defined shape may be recognised as more than one type. Resulting in shapes possibly being classified differently when different criteria are applied. This paper outlines the work being carried out to develop a new technique for shape clustering. It highlights the current methods for analysing shapes found in computer aided sketching systems, before a method is proposed that addresses shape clustering and pattern matching. Clustering for vague geometric models and multiple viewpoint support are explored
Multi-view Convolutional Neural Networks for 3D Shape Recognition
A longstanding question in computer vision concerns the representation of 3D
shapes for recognition: should 3D shapes be represented with descriptors
operating on their native 3D formats, such as voxel grid or polygon mesh, or
can they be effectively represented with view-based descriptors? We address
this question in the context of learning to recognize 3D shapes from a
collection of their rendered views on 2D images. We first present a standard
CNN architecture trained to recognize the shapes' rendered views independently
of each other, and show that a 3D shape can be recognized even from a single
view at an accuracy far higher than using state-of-the-art 3D shape
descriptors. Recognition rates further increase when multiple views of the
shapes are provided. In addition, we present a novel CNN architecture that
combines information from multiple views of a 3D shape into a single and
compact shape descriptor offering even better recognition performance. The same
architecture can be applied to accurately recognize human hand-drawn sketches
of shapes. We conclude that a collection of 2D views can be highly informative
for 3D shape recognition and is amenable to emerging CNN architectures and
their derivatives.Comment: v1: Initial version. v2: An updated ModelNet40 training/test split is
used; results with low-rank Mahalanobis metric learning are added. v3 (ICCV
2015): A second camera setup without the upright orientation assumption is
added; some accuracy and mAP numbers are changed slightly because a small
issue in mesh rendering related to specularities is fixe
Deep Shape Matching
We cast shape matching as metric learning with convolutional networks. We
break the end-to-end process of image representation into two parts. Firstly,
well established efficient methods are chosen to turn the images into edge
maps. Secondly, the network is trained with edge maps of landmark images, which
are automatically obtained by a structure-from-motion pipeline. The learned
representation is evaluated on a range of different tasks, providing
improvements on challenging cases of domain generalization, generic
sketch-based image retrieval or its fine-grained counterpart. In contrast to
other methods that learn a different model per task, object category, or
domain, we use the same network throughout all our experiments, achieving
state-of-the-art results in multiple benchmarks.Comment: ECCV 201
Domain Generalization by Solving Jigsaw Puzzles
Human adaptability relies crucially on the ability to learn and merge
knowledge both from supervised and unsupervised learning: the parents point out
few important concepts, but then the children fill in the gaps on their own.
This is particularly effective, because supervised learning can never be
exhaustive and thus learning autonomously allows to discover invariances and
regularities that help to generalize. In this paper we propose to apply a
similar approach to the task of object recognition across domains: our model
learns the semantic labels in a supervised fashion, and broadens its
understanding of the data by learning from self-supervised signals how to solve
a jigsaw puzzle on the same images. This secondary task helps the network to
learn the concepts of spatial correlation while acting as a regularizer for the
classification task. Multiple experiments on the PACS, VLCS, Office-Home and
digits datasets confirm our intuition and show that this simple method
outperforms previous domain generalization and adaptation solutions. An
ablation study further illustrates the inner workings of our approach.Comment: Accepted at CVPR 2019 (oral
- …