36 research outputs found
Guiding InfoGAN with Semi-Supervision
In this paper we propose a new semi-supervised GAN architecture (ss-InfoGAN)
for image synthesis that leverages information from few labels (as little as
0.22%, max. 10% of the dataset) to learn semantically meaningful and
controllable data representations where latent variables correspond to label
categories. The architecture builds on Information Maximizing Generative
Adversarial Networks (InfoGAN) and is shown to learn both continuous and
categorical codes and achieves higher quality of synthetic samples compared to
fully unsupervised settings. Furthermore, we show that using small amounts of
labeled data speeds-up training convergence. The architecture maintains the
ability to disentangle latent variables for which no labels are available.
Finally, we contribute an information-theoretic reasoning on how introducing
semi-supervision increases mutual information between synthetic and real data
STCN: Stochastic Temporal Convolutional Networks
Convolutional architectures have recently been shown to be competitive on
many sequence modelling tasks when compared to the de-facto standard of
recurrent neural networks (RNNs), while providing computational and modeling
advantages due to inherent parallelism. However, currently there remains a
performance gap to more expressive stochastic RNN variants, especially those
with several layers of dependent random variables. In this work, we propose
stochastic temporal convolutional networks (STCNs), a novel architecture that
combines the computational advantages of temporal convolutional networks (TCN)
with the representational power and robustness of stochastic latent spaces. In
particular, we propose a hierarchy of stochastic latent variables that captures
temporal dependencies at different time-scales. The architecture is modular and
flexible due to the decoupling of the deterministic and stochastic layers. We
show that the proposed architecture achieves state of the art log-likelihoods
across several tasks. Finally, the model is capable of predicting high-quality
synthetic samples over a long-range temporal horizon in modeling of handwritten
text
Learning Human Motion Models for Long-term Predictions
We propose a new architecture for the learning of predictive spatio-temporal
motion models from data alone. Our approach, dubbed the Dropout Autoencoder
LSTM, is capable of synthesizing natural looking motion sequences over long
time horizons without catastrophic drift or motion degradation. The model
consists of two components, a 3-layer recurrent neural network to model
temporal aspects and a novel auto-encoder that is trained to implicitly recover
the spatial structure of the human skeleton via randomly removing information
about joints during training time. This Dropout Autoencoder (D-AE) is then used
to filter each predicted pose of the LSTM, reducing accumulation of error and
hence drift over time. Furthermore, we propose new evaluation protocols to
assess the quality of synthetic motion sequences even for which no ground truth
data exists. The proposed protocols can be used to assess generated sequences
of arbitrary length. Finally, we evaluate our proposed method on two of the
largest motion-capture datasets available to date and show that our model
outperforms the state-of-the-art on a variety of actions, including cyclic and
acyclic motion, and that it can produce natural looking sequences over longer
time horizons than previous methods
Structured Prediction Helps 3D Human Motion Modelling
Human motion prediction is a challenging and important task in many computer
vision application domains. Existing work only implicitly models the spatial
structure of the human skeleton. In this paper, we propose a novel approach
that decomposes the prediction into individual joints by means of a structured
prediction layer that explicitly models the joint dependencies. This is
implemented via a hierarchy of small-sized neural networks connected
analogously to the kinematic chains in the human body as well as a joint-wise
decomposition in the loss function. The proposed layer is agnostic to the
underlying network and can be used with existing architectures for motion
modelling. Prior work typically leverages the H3.6M dataset. We show that some
state-of-the-art techniques do not perform well when trained and tested on
AMASS, a recently released dataset 14 times the size of H3.6M. Our experiments
indicate that the proposed layer increases the performance of motion
forecasting irrespective of the base network, joint-angle representation, and
prediction horizon. We furthermore show that the layer also improves motion
predictions qualitatively. We make code and models publicly available at
https://ait.ethz.ch/projects/2019/spl.Comment: ICCV 201