388 research outputs found
Neural 3D Mesh Renderer
For modeling the 3D world behind 2D images, which 3D representation is most
appropriate? A polygon mesh is a promising candidate for its compactness and
geometric properties. However, it is not straightforward to model a polygon
mesh from 2D images using neural networks because the conversion from a mesh to
an image, or rendering, involves a discrete operation called rasterization,
which prevents back-propagation. Therefore, in this work, we propose an
approximate gradient for rasterization that enables the integration of
rendering into neural networks. Using this renderer, we perform single-image 3D
mesh reconstruction with silhouette image supervision and our system
outperforms the existing voxel-based approach. Additionally, we perform
gradient-based 3D mesh editing operations, such as 2D-to-3D style transfer and
3D DeepDream, with 2D supervision for the first time. These applications
demonstrate the potential of the integration of a mesh renderer into neural
networks and the effectiveness of our proposed renderer
Between-class Learning for Image Classification
In this paper, we propose a novel learning method for image classification
called Between-Class learning (BC learning). We generate between-class images
by mixing two images belonging to different classes with a random ratio. We
then input the mixed image to the model and train the model to output the
mixing ratio. BC learning has the ability to impose constraints on the shape of
the feature distributions, and thus the generalization ability is improved. BC
learning is originally a method developed for sounds, which can be digitally
mixed. Mixing two image data does not appear to make sense; however, we argue
that because convolutional neural networks have an aspect of treating input
data as waveforms, what works on sounds must also work on images. First, we
propose a simple mixing method using internal divisions, which surprisingly
proves to significantly improve performance. Second, we propose a mixing method
that treats the images as waveforms, which leads to a further improvement in
performance. As a result, we achieved 19.4% and 2.26% top-1 errors on
ImageNet-1K and CIFAR-10, respectively.Comment: 11 pages, 8 figures, published as a conference paper at CVPR 201
Hierarchical Video Generation from Orthogonal Information: Optical Flow and Texture
Learning to represent and generate videos from unlabeled data is a very
challenging problem. To generate realistic videos, it is important not only to
ensure that the appearance of each frame is real, but also to ensure the
plausibility of a video motion and consistency of a video appearance in the
time direction. The process of video generation should be divided according to
these intrinsic difficulties. In this study, we focus on the motion and
appearance information as two important orthogonal components of a video, and
propose Flow-and-Texture-Generative Adversarial Networks (FTGAN) consisting of
FlowGAN and TextureGAN. In order to avoid a huge annotation cost, we have to
explore a way to learn from unlabeled data. Thus, we employ optical flow as
motion information to generate videos. FlowGAN generates optical flow, which
contains only the edge and motion of the videos to be begerated. On the other
hand, TextureGAN specializes in giving a texture to optical flow generated by
FlowGAN. This hierarchical approach brings more realistic videos with plausible
motion and appearance consistency. Our experiments show that our model
generates more plausible motion videos and also achieves significantly improved
performance for unsupervised action classification in comparison to previous
GAN works. In addition, because our model generates videos from two independent
information, our model can generate new combinations of motion and attribute
that are not seen in training data, such as a video in which a person is doing
sit-up in a baseball ground.Comment: Our supplemental material is available on
http://www.mi.t.u-tokyo.ac.jp/assets/publication/hierarchical_video_generation_sup/
Accepted to AAAI201
- …