285 research outputs found
Manifold-valued Image Generation with Wasserstein Generative Adversarial Nets
Generative modeling over natural images is one of the most fundamental
machine learning problems. However, few modern generative models, including
Wasserstein Generative Adversarial Nets (WGANs), are studied on manifold-valued
images that are frequently encountered in real-world applications. To fill the
gap, this paper first formulates the problem of generating manifold-valued
images and exploits three typical instances: hue-saturation-value (HSV) color
image generation, chromaticity-brightness (CB) color image generation, and
diffusion-tensor (DT) image generation. For the proposed generative modeling
problem, we then introduce a theorem of optimal transport to derive a new
Wasserstein distance of data distributions on complete manifolds, enabling us
to achieve a tractable objective under the WGAN framework. In addition, we
recommend three benchmark datasets that are CIFAR-10 HSV/CB color images,
ImageNet HSV/CB color images, UCL DT image datasets. On the three datasets, we
experimentally demonstrate the proposed manifold-aware WGAN model can generate
more plausible manifold-valued images than its competitors.Comment: Accepted by AAAI 201
Building Deep Networks on Grassmann Manifolds
Learning representations on Grassmann manifolds is popular in quite a few
visual recognition tasks. In order to enable deep learning on Grassmann
manifolds, this paper proposes a deep network architecture by generalizing the
Euclidean network paradigm to Grassmann manifolds. In particular, we design
full rank mapping layers to transform input Grassmannian data to more desirable
ones, exploit re-orthonormalization layers to normalize the resulting matrices,
study projection pooling layers to reduce the model complexity in the
Grassmannian context, and devise projection mapping layers to respect
Grassmannian geometry and meanwhile achieve Euclidean forms for regular output
layers. To train the Grassmann networks, we exploit a stochastic gradient
descent setting on manifolds of the connection weights, and study a matrix
generalization of backpropagation to update the structured data. The
evaluations on three visual recognition tasks show that our Grassmann networks
have clear advantages over existing Grassmann learning methods, and achieve
results comparable with state-of-the-art approaches.Comment: AAAI'18 pape
Deep Learning on Lie Groups for Skeleton-based Action Recognition
In recent years, skeleton-based action recognition has become a popular 3D
classification problem. State-of-the-art methods typically first represent each
motion sequence as a high-dimensional trajectory on a Lie group with an
additional dynamic time warping, and then shallowly learn favorable Lie group
features. In this paper we incorporate the Lie group structure into a deep
network architecture to learn more appropriate Lie group features for 3D action
recognition. Within the network structure, we design rotation mapping layers to
transform the input Lie group features into desirable ones, which are aligned
better in the temporal domain. To reduce the high feature dimensionality, the
architecture is equipped with rotation pooling layers for the elements on the
Lie group. Furthermore, we propose a logarithm mapping layer to map the
resulting manifold data into a tangent space that facilitates the application
of regular output layers for the final classification. Evaluations of the
proposed network for standard 3D human action recognition datasets clearly
demonstrate its superiority over existing shallow Lie group feature learning
methods as well as most conventional deep learning methods.Comment: Accepted to CVPR 201
Wasserstein Divergence for GANs
In many domains of computer vision, generative adversarial networks (GANs)
have achieved great success, among which the family of Wasserstein GANs (WGANs)
is considered to be state-of-the-art due to the theoretical contributions and
competitive qualitative performance. However, it is very challenging to
approximate the -Lipschitz constraint required by the Wasserstein-1
metric~(W-met). In this paper, we propose a novel Wasserstein
divergence~(W-div), which is a relaxed version of W-met and does not require
the -Lipschitz constraint. As a concrete application, we introduce a
Wasserstein divergence objective for GANs~(WGAN-div), which can faithfully
approximate W-div through optimization. Under various settings, including
progressive growing training, we demonstrate the stability of the proposed
WGAN-div owing to its theoretical and practical advantages over WGANs. Also, we
study the quantitative and visual performance of WGAN-div on standard image
synthesis benchmarks of computer vision, showing the superior performance of
WGAN-div compared to the state-of-the-art methods.Comment: accepted by eccv_2018, correct minor error
Sliced Wasserstein Generative Models
In generative modeling, the Wasserstein distance (WD) has emerged as a useful
metric to measure the discrepancy between generated and real data
distributions. Unfortunately, it is challenging to approximate the WD of
high-dimensional distributions. In contrast, the sliced Wasserstein distance
(SWD) factorizes high-dimensional distributions into their multiple
one-dimensional marginal distributions and is thus easier to approximate. In
this paper, we introduce novel approximations of the primal and dual SWD.
Instead of using a large number of random projections, as it is done by
conventional SWD approximation methods, we propose to approximate SWDs with a
small number of parameterized orthogonal projections in an end-to-end deep
learning fashion. As concrete applications of our SWD approximations, we design
two types of differentiable SWD blocks to equip modern generative
frameworks---Auto-Encoders (AE) and Generative Adversarial Networks (GAN). In
the experiments, we not only show the superiority of the proposed generative
models on standard image synthesis benchmarks, but also demonstrate the
state-of-the-art performance on challenging high resolution image and video
generation in an unsupervised manner.Comment: This paper is accepted by CVPR 2019, accidentally uploaded as a new
submission (arXiv:1904.05408, which has been withdrawn). The code is
available at this https URL https:// github.com/musikisomorphie/swd.gi
- …