7,645 research outputs found
Adversarial Framework for Unsupervised Learning of Motion Dynamics in Videos
Human behavior understanding in videos is a complex, still unsolved problem
and requires to accurately model motion at both the local (pixel-wise dense
prediction) and global (aggregation of motion cues) levels. Current approaches
based on supervised learning require large amounts of annotated data, whose
scarce availability is one of the main limiting factors to the development of
general solutions. Unsupervised learning can instead leverage the vast amount
of videos available on the web and it is a promising solution for overcoming
the existing limitations. In this paper, we propose an adversarial GAN-based
framework that learns video representations and dynamics through a
self-supervision mechanism in order to perform dense and global prediction in
videos. Our approach synthesizes videos by 1) factorizing the process into the
generation of static visual content and motion, 2) learning a suitable
representation of a motion latent space in order to enforce spatio-temporal
coherency of object trajectories, and 3) incorporating motion estimation and
pixel-wise dense prediction into the training procedure. Self-supervision is
enforced by using motion masks produced by the generator, as a co-product of
its generation process, to supervise the discriminator network in performing
dense prediction. Performance evaluation, carried out on standard benchmarks,
shows that our approach is able to learn, in an unsupervised way, both local
and global video dynamics. The learned representations, then, support the
training of video object segmentation methods with sensibly less (about 50%)
annotations, giving performance comparable to the state of the art.
Furthermore, the proposed method achieves promising performance in generating
realistic videos, outperforming state-of-the-art approaches especially on
motion-related metrics
Unsupervised shape transformer for image translation and cross-domain retrieval
We address the problem of unsupervised geometric image-to-image translation.
Rather than transferring the style of an image as a whole, our goal is to
translate the geometry of an object as depicted in different domains while
preserving its appearance characteristics. Our model is trained in an
unsupervised fashion, i.e. without the need of paired images during training.
It performs all steps of the shape transfer within a single model and without
additional post-processing stages. Extensive experiments on the VITON,
CMU-Multi-PIE and our own FashionStyle datasets show the effectiveness of the
method. In addition, we show that despite their low-dimensionality, the
features learned by our model are useful to the item retrieval task
Twin-GAN -- Unpaired Cross-Domain Image Translation with Weight-Sharing GANs
We present a framework for translating unlabeled images from one domain into
analog images in another domain. We employ a progressively growing
skip-connected encoder-generator structure and train it with a GAN loss for
realistic output, a cycle consistency loss for maintaining same-domain
translation identity, and a semantic consistency loss that encourages the
network to keep the input semantic features in the output. We apply our
framework on the task of translating face images, and show that it is capable
of learning semantic mappings for face images with no supervised one-to-one
image mapping
When Autonomous Systems Meet Accuracy and Transferability through AI: A Survey
With widespread applications of artificial intelligence (AI), the
capabilities of the perception, understanding, decision-making and control for
autonomous systems have improved significantly in the past years. When
autonomous systems consider the performance of accuracy and transferability,
several AI methods, like adversarial learning, reinforcement learning (RL) and
meta-learning, show their powerful performance. Here, we review the
learning-based approaches in autonomous systems from the perspectives of
accuracy and transferability. Accuracy means that a well-trained model shows
good results during the testing phase, in which the testing set shares a same
task or a data distribution with the training set. Transferability means that
when a well-trained model is transferred to other testing domains, the accuracy
is still good. Firstly, we introduce some basic concepts of transfer learning
and then present some preliminaries of adversarial learning, RL and
meta-learning. Secondly, we focus on reviewing the accuracy or transferability
or both of them to show the advantages of adversarial learning, like generative
adversarial networks (GANs), in typical computer vision tasks in autonomous
systems, including image style transfer, image superresolution, image
deblurring/dehazing/rain removal, semantic segmentation, depth estimation,
pedestrian detection and person re-identification (re-ID). Then, we further
review the performance of RL and meta-learning from the aspects of accuracy or
transferability or both of them in autonomous systems, involving pedestrian
tracking, robot navigation and robotic manipulation. Finally, we discuss
several challenges and future topics for using adversarial learning, RL and
meta-learning in autonomous systems
Recent Advances in Autoencoder-Based Representation Learning
Learning useful representations with little or no supervision is a key
challenge in artificial intelligence. We provide an in-depth review of recent
advances in representation learning with a focus on autoencoder-based models.
To organize these results we make use of meta-priors believed useful for
downstream tasks, such as disentanglement and hierarchical organization of
features. In particular, we uncover three main mechanisms to enforce such
properties, namely (i) regularizing the (approximate or aggregate) posterior
distribution, (ii) factorizing the encoding and decoding distribution, or (iii)
introducing a structured prior distribution. While there are some promising
results, implicit or explicit supervision remains a key enabler and all current
methods use strong inductive biases and modeling assumptions. Finally, we
provide an analysis of autoencoder-based representation learning through the
lens of rate-distortion theory and identify a clear tradeoff between the amount
of prior knowledge available about the downstream tasks, and how useful the
representation is for this task.Comment: Presented at the third workshop on Bayesian Deep Learning (NeurIPS
2018
How Generative Adversarial Networks and Their Variants Work: An Overview
Generative Adversarial Networks (GAN) have received wide attention in the
machine learning field for their potential to learn high-dimensional, complex
real data distribution. Specifically, they do not rely on any assumptions about
the distribution and can generate real-like samples from latent space in a
simple manner. This powerful property leads GAN to be applied to various
applications such as image synthesis, image attribute editing, image
translation, domain adaptation and other academic fields. In this paper, we aim
to discuss the details of GAN for those readers who are familiar with, but do
not comprehend GAN deeply or who wish to view GAN from various perspectives. In
addition, we explain how GAN operates and the fundamental meaning of various
objective functions that have been suggested recently. We then focus on how the
GAN can be combined with an autoencoder framework. Finally, we enumerate the
GAN variants that are applied to various tasks and other fields for those who
are interested in exploiting GAN for their research.Comment: 41 pages, 16 figures, Published in ACM Computing Surveys (CSUR
Object Discovery with a Copy-Pasting GAN
We tackle the problem of object discovery, where objects are segmented for a
given input image, and the system is trained without using any direct
supervision whatsoever. A novel copy-pasting GAN framework is proposed, where
the generator learns to discover an object in one image by compositing it into
another image such that the discriminator cannot tell that the resulting image
is fake. After carefully addressing subtle issues, such as preventing the
generator from `cheating', this game results in the generator learning to
select objects, as copy-pasting objects is most likely to fool the
discriminator. The system is shown to work well on four very different
datasets, including large object appearance variations in challenging cluttered
backgrounds
LR-GAN: Layered Recursive Generative Adversarial Networks for Image Generation
We present LR-GAN: an adversarial image generation model which takes scene
structure and context into account. Unlike previous generative adversarial
networks (GANs), the proposed GAN learns to generate image background and
foregrounds separately and recursively, and stitch the foregrounds on the
background in a contextually relevant manner to produce a complete natural
image. For each foreground, the model learns to generate its appearance, shape
and pose. The whole model is unsupervised, and is trained in an end-to-end
manner with gradient descent methods. The experiments demonstrate that LR-GAN
can generate more natural images with objects that are more human recognizable
than DCGAN.Comment: 21 pages, 22 figures, published as a conference paper at ICLR 2017,
code available on GitHu
Generative Semantic Manipulation with Contrasting GAN
Generative Adversarial Networks (GANs) have recently achieved significant
improvement on paired/unpaired image-to-image translation, such as
photo sketch and artist painting style transfer. However, existing
models can only be capable of transferring the low-level information (e.g.
color or texture changes), but fail to edit high-level semantic meanings (e.g.,
geometric structure or content) of objects. On the other hand, while some
researches can synthesize compelling real-world images given a class label or
caption, they cannot condition on arbitrary shapes or structures, which largely
limits their application scenarios and interpretive capability of model
results. In this work, we focus on a more challenging semantic manipulation
task, which aims to modify the semantic meaning of an object while preserving
its own characteristics (e.g. viewpoints and shapes), such as
cowsheep, motor bicycle, catdog. To
tackle such large semantic changes, we introduce a contrasting GAN
(contrast-GAN) with a novel adversarial contrasting objective. Instead of
directly making the synthesized samples close to target data as previous GANs
did, our adversarial contrasting objective optimizes over the distance
comparisons between samples, that is, enforcing the manipulated data be
semantically closer to the real data with target category than the input data.
Equipped with the new contrasting objective, a novel mask-conditional
contrast-GAN architecture is proposed to enable disentangle image background
with object semantic changes. Experiments on several semantic manipulation
tasks on ImageNet and MSCOCO dataset show considerable performance gain by our
contrast-GAN over other conditional GANs. Quantitative results further
demonstrate the superiority of our model on generating manipulated results with
high visual fidelity and reasonable object semantics
Hierarchical Detail Enhancing Mesh-Based Shape Generation with 3D Generative Adversarial Network
Automatic mesh-based shape generation is of great interest across a wide
range of disciplines, from industrial design to gaming, computer graphics and
various other forms of digital art. While most traditional methods focus on
primitive based model generation, advances in deep learning made it possible to
learn 3-dimensional geometric shape representations in an end-to-end manner.
However, most current deep learning based frameworks focus on the
representation and generation of voxel and point-cloud based shapes, making it
not directly applicable to design and graphics communities. This study
addresses the needs for automatic generation of mesh-based geometries, and
propose a novel framework that utilizes signed distance function representation
that generates detail preserving three-dimensional surface mesh by a deep
learning based approach
- …