206 research outputs found
Neural 3D Mesh Renderer
For modeling the 3D world behind 2D images, which 3D representation is most
appropriate? A polygon mesh is a promising candidate for its compactness and
geometric properties. However, it is not straightforward to model a polygon
mesh from 2D images using neural networks because the conversion from a mesh to
an image, or rendering, involves a discrete operation called rasterization,
which prevents back-propagation. Therefore, in this work, we propose an
approximate gradient for rasterization that enables the integration of
rendering into neural networks. Using this renderer, we perform single-image 3D
mesh reconstruction with silhouette image supervision and our system
outperforms the existing voxel-based approach. Additionally, we perform
gradient-based 3D mesh editing operations, such as 2D-to-3D style transfer and
3D DeepDream, with 2D supervision for the first time. These applications
demonstrate the potential of the integration of a mesh renderer into neural
networks and the effectiveness of our proposed renderer
Between-class Learning for Image Classification
In this paper, we propose a novel learning method for image classification
called Between-Class learning (BC learning). We generate between-class images
by mixing two images belonging to different classes with a random ratio. We
then input the mixed image to the model and train the model to output the
mixing ratio. BC learning has the ability to impose constraints on the shape of
the feature distributions, and thus the generalization ability is improved. BC
learning is originally a method developed for sounds, which can be digitally
mixed. Mixing two image data does not appear to make sense; however, we argue
that because convolutional neural networks have an aspect of treating input
data as waveforms, what works on sounds must also work on images. First, we
propose a simple mixing method using internal divisions, which surprisingly
proves to significantly improve performance. Second, we propose a mixing method
that treats the images as waveforms, which leads to a further improvement in
performance. As a result, we achieved 19.4% and 2.26% top-1 errors on
ImageNet-1K and CIFAR-10, respectively.Comment: 11 pages, 8 figures, published as a conference paper at CVPR 201
Hierarchical Video Generation from Orthogonal Information: Optical Flow and Texture
Learning to represent and generate videos from unlabeled data is a very
challenging problem. To generate realistic videos, it is important not only to
ensure that the appearance of each frame is real, but also to ensure the
plausibility of a video motion and consistency of a video appearance in the
time direction. The process of video generation should be divided according to
these intrinsic difficulties. In this study, we focus on the motion and
appearance information as two important orthogonal components of a video, and
propose Flow-and-Texture-Generative Adversarial Networks (FTGAN) consisting of
FlowGAN and TextureGAN. In order to avoid a huge annotation cost, we have to
explore a way to learn from unlabeled data. Thus, we employ optical flow as
motion information to generate videos. FlowGAN generates optical flow, which
contains only the edge and motion of the videos to be begerated. On the other
hand, TextureGAN specializes in giving a texture to optical flow generated by
FlowGAN. This hierarchical approach brings more realistic videos with plausible
motion and appearance consistency. Our experiments show that our model
generates more plausible motion videos and also achieves significantly improved
performance for unsupervised action classification in comparison to previous
GAN works. In addition, because our model generates videos from two independent
information, our model can generate new combinations of motion and attribute
that are not seen in training data, such as a video in which a person is doing
sit-up in a baseball ground.Comment: Our supplemental material is available on
http://www.mi.t.u-tokyo.ac.jp/assets/publication/hierarchical_video_generation_sup/
Accepted to AAAI201
Maximum Classifier Discrepancy for Unsupervised Domain Adaptation
In this work, we present a method for unsupervised domain adaptation. Many
adversarial learning methods train domain classifier networks to distinguish
the features as either a source or target and train a feature generator network
to mimic the discriminator. Two problems exist with these methods. First, the
domain classifier only tries to distinguish the features as a source or target
and thus does not consider task-specific decision boundaries between classes.
Therefore, a trained generator can generate ambiguous features near class
boundaries. Second, these methods aim to completely match the feature
distributions between different domains, which is difficult because of each
domain's characteristics.
To solve these problems, we introduce a new approach that attempts to align
distributions of source and target by utilizing the task-specific decision
boundaries. We propose to maximize the discrepancy between two classifiers'
outputs to detect target samples that are far from the support of the source. A
feature generator learns to generate target features near the support to
minimize the discrepancy. Our method outperforms other methods on several
datasets of image classification and semantic segmentation. The codes are
available at \url{https://github.com/mil-tokyo/MCD_DA}Comment: Accepted to CVPR2018 Oral, Code is available at
https://github.com/mil-tokyo/MCD_D
- …