Search CORE

174 research outputs found

Guiding InfoGAN with Semi-Supervision

Author: Aksan Emre
Hilliges Otmar
Spurr Adrian
Publication venue
Publication date: 14/07/2017
Field of study

In this paper we propose a new semi-supervised GAN architecture (ss-InfoGAN) for image synthesis that leverages information from few labels (as little as 0.22%, max. 10% of the dataset) to learn semantically meaningful and controllable data representations where latent variables correspond to label categories. The architecture builds on Information Maximizing Generative Adversarial Networks (InfoGAN) and is shown to learn both continuous and categorical codes and achieves higher quality of synthetic samples compared to fully unsupervised settings. Furthermore, we show that using small amounts of labeled data speeds-up training convergence. The architecture maintains the ability to disentangle latent variables for which no labels are available. Finally, we contribute an information-theoretic reasoning on how introducing semi-supervision increases mutual information between synthetic and real data

arXiv.org e-Print Archive

Crossref

STCN: Stochastic Temporal Convolutional Networks

Author: Aksan Emre
Hilliges Otmar
Publication venue
Publication date: 18/02/2019
Field of study

Convolutional architectures have recently been shown to be competitive on many sequence modelling tasks when compared to the de-facto standard of recurrent neural networks (RNNs), while providing computational and modeling advantages due to inherent parallelism. However, currently there remains a performance gap to more expressive stochastic RNN variants, especially those with several layers of dependent random variables. In this work, we propose stochastic temporal convolutional networks (STCNs), a novel architecture that combines the computational advantages of temporal convolutional networks (TCN) with the representational power and robustness of stochastic latent spaces. In particular, we propose a hierarchy of stochastic latent variables that captures temporal dependencies at different time-scales. The architecture is modular and flexible due to the decoupling of the deterministic and stochastic layers. We show that the proposed architecture achieves state of the art log-likelihoods across several tasks. Finally, the model is capable of predicting high-quality synthetic samples over a long-range temporal horizon in modeling of handwritten text

arXiv.org e-Print Archive

Repository for Publications and Research Data

Bringing the Physical to the Digital

Author: Hilliges Otmar
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 22/07/2009
Field of study

This dissertation describes an exploration of digital tabletop interaction styles, with the ultimate goal of informing the design of a new model for tabletop interaction. In the context of this thesis the term digital tabletop refers to an emerging class of devices that afford many novel ways of interaction with the digital. Allowing users to directly touch information presented on large, horizontal displays. Being a relatively young field, many developments are in flux; hardware and software change at a fast pace and many interesting alternative approaches are available at the same time. In our research we are especially interested in systems that are capable of sensing multiple contacts (e.g., fingers) and richer information such as the outline of whole hands or other physical objects. New sensor hardware enable new ways to interact with the digital. When embarking into the research for this thesis, the question which interaction styles could be appropriate for this new class of devices was a open question, with many equally promising answers. Many everyday activities rely on our hands ability to skillfully control and manipulate physical objects. We seek to open up different possibilities to exploit our manual dexterity and provide users with richer interaction possibilities. This could be achieved through the use of physical objects as input mediators or through virtual interfaces that behave in a more realistic fashion. In order to gain a better understanding of the underlying design space we choose an approach organized into two phases. First, two different prototypes, each representing a specific interaction style – namely gesture-based interaction and tangible interaction – have been implemented. The flexibility of use afforded by the interface and the level of physicality afforded by the interface elements are introduced as criteria for evaluation. Each approaches’ suitability to support the highly dynamic and often unstructured interactions typical for digital tabletops is analyzed based on these criteria. In a second stage the learnings from these initial explorations are applied to inform the design of a novel model for digital tabletop interaction. This model is based on the combination of rich multi-touch sensing and a three dimensional environment enriched by a gaming physics simulation. The proposed approach enables users to interact with the virtual through richer quantities such as collision and friction. Enabling a variety of fine-grained interactions using multiple fingers, whole hands and physical objects. Our model makes digital tabletop interaction even more “natural”. However, because the interaction – the sensed input and the displayed output – is still bound to the surface, there is a fundamental limitation in manipulating objects using the third dimension. To address this issue, we present a technique that allows users to – conceptually – pick objects off the surface and control their position in 3D. Our goal has been to define a technique that completes our model for on-surface interaction and allows for “as-direct-as possible” interactions. We also present two hardware prototypes capable of sensing the users’ interactions beyond the table’s surface. Finally, we present visual feedback mechanisms to give the users the sense that they are actually lifting the objects off the surface. This thesis contributes on various levels. We present several novel prototypes that we built and evaluated. We use these prototypes to systematically explore the design space of digital tabletop interaction. The flexibility of use afforded by the interaction style is introduced as criterion alongside the user interface elements’ physicality. Each approaches’ suitability to support the highly dynamic and often unstructured interactions typical for digital tabletops are analyzed. We present a new model for tabletop interaction that increases the fidelity of interaction possible in such settings. Finally, we extend this model so to enable as direct as possible interactions with 3D data, interacting from above the table’s surface

Digitale Hochschulschriften der LMU

Learning Human Motion Models for Long-term Predictions

Author: Aksan Emre
Ghosh Partha
Hilliges Otmar
Song Jie
Publication venue
Publication date: 03/12/2017
Field of study

We propose a new architecture for the learning of predictive spatio-temporal motion models from data alone. Our approach, dubbed the Dropout Autoencoder LSTM, is capable of synthesizing natural looking motion sequences over long time horizons without catastrophic drift or motion degradation. The model consists of two components, a 3-layer recurrent neural network to model temporal aspects and a novel auto-encoder that is trained to implicitly recover the spatial structure of the human skeleton via randomly removing information about joints during training time. This Dropout Autoencoder (D-AE) is then used to filter each predicted pose of the LSTM, reducing accumulation of error and hence drift over time. Furthermore, we propose new evaluation protocols to assess the quality of synthetic motion sequences even for which no ground truth data exists. The proposed protocols can be used to assess generated sequences of arbitrary length. Finally, we evaluate our proposed method on two of the largest motion-capture datasets available to date and show that our model outperforms the state-of-the-art on a variety of actions, including cyclic and acyclic motion, and that it can produce natural looking sequences over longer time horizons than previous methods

arXiv.org e-Print Archive

Crossref