174 research outputs found
Guiding InfoGAN with Semi-Supervision
In this paper we propose a new semi-supervised GAN architecture (ss-InfoGAN)
for image synthesis that leverages information from few labels (as little as
0.22%, max. 10% of the dataset) to learn semantically meaningful and
controllable data representations where latent variables correspond to label
categories. The architecture builds on Information Maximizing Generative
Adversarial Networks (InfoGAN) and is shown to learn both continuous and
categorical codes and achieves higher quality of synthetic samples compared to
fully unsupervised settings. Furthermore, we show that using small amounts of
labeled data speeds-up training convergence. The architecture maintains the
ability to disentangle latent variables for which no labels are available.
Finally, we contribute an information-theoretic reasoning on how introducing
semi-supervision increases mutual information between synthetic and real data
STCN: Stochastic Temporal Convolutional Networks
Convolutional architectures have recently been shown to be competitive on
many sequence modelling tasks when compared to the de-facto standard of
recurrent neural networks (RNNs), while providing computational and modeling
advantages due to inherent parallelism. However, currently there remains a
performance gap to more expressive stochastic RNN variants, especially those
with several layers of dependent random variables. In this work, we propose
stochastic temporal convolutional networks (STCNs), a novel architecture that
combines the computational advantages of temporal convolutional networks (TCN)
with the representational power and robustness of stochastic latent spaces. In
particular, we propose a hierarchy of stochastic latent variables that captures
temporal dependencies at different time-scales. The architecture is modular and
flexible due to the decoupling of the deterministic and stochastic layers. We
show that the proposed architecture achieves state of the art log-likelihoods
across several tasks. Finally, the model is capable of predicting high-quality
synthetic samples over a long-range temporal horizon in modeling of handwritten
text
Bringing the Physical to the Digital
This dissertation describes an exploration of digital tabletop interaction styles, with the ultimate goal of informing the design of a new model for tabletop interaction. In the context of this thesis the term digital tabletop refers to an emerging class of devices that afford many novel ways of interaction with the digital. Allowing users to directly touch information presented on large,
horizontal displays. Being a relatively young field, many developments are in flux; hardware and software change at a fast pace and many interesting alternative approaches are available at the same time. In our research we are especially interested in systems that are capable of sensing multiple contacts (e.g., fingers) and richer information such as the outline of whole hands or other physical objects. New sensor hardware enable new ways to interact with the digital. When embarking into the research for this thesis, the question which interaction styles could
be appropriate for this new class of devices was a open question, with many equally promising answers.
Many everyday activities rely on our hands ability to skillfully control and manipulate physical objects. We seek to open up different possibilities to exploit our manual dexterity and provide users with richer interaction possibilities. This could be achieved through the use of physical objects as input mediators or through virtual interfaces that behave in a more realistic fashion.
In order to gain a better understanding of the underlying design space we choose an approach organized into two phases. First, two different prototypes, each representing a specific interaction style – namely gesture-based interaction and tangible interaction – have been implemented. The flexibility of use afforded by the interface and the level of physicality afforded by the interface elements are introduced as criteria for evaluation. Each approaches’ suitability to support the
highly dynamic and often unstructured interactions typical for digital tabletops is analyzed based
on these criteria.
In a second stage the learnings from these initial explorations are applied to inform the design of a novel model for digital tabletop interaction. This model is based on the combination of rich multi-touch sensing and a three dimensional environment enriched by a gaming physics simulation. The proposed approach enables users to interact with the virtual through richer quantities such as collision and friction. Enabling a variety of fine-grained interactions using multiple fingers, whole hands and physical objects.
Our model makes digital tabletop interaction even more “natural”. However, because the interaction – the sensed input and the displayed output – is still bound to the surface, there is a fundamental limitation in manipulating objects using the third dimension. To address this issue,
we present a technique that allows users to – conceptually – pick objects off the surface and control their position in 3D. Our goal has been to define a technique that completes our model for on-surface interaction and allows for “as-direct-as possible” interactions. We also present
two hardware prototypes capable of sensing the users’ interactions beyond the table’s surface.
Finally, we present visual feedback mechanisms to give the users the sense that they are actually lifting the objects off the surface.
This thesis contributes on various levels. We present several novel prototypes that we built and evaluated. We use these prototypes to systematically explore the design space of digital tabletop interaction. The flexibility of use afforded by the interaction style is introduced as criterion alongside the user interface elements’ physicality. Each approaches’ suitability to support the
highly dynamic and often unstructured interactions typical for digital tabletops are analyzed. We present a new model for tabletop interaction that increases the fidelity of interaction possible in
such settings. Finally, we extend this model so to enable as direct as possible interactions with
3D data, interacting from above the table’s surface
Learning Human Motion Models for Long-term Predictions
We propose a new architecture for the learning of predictive spatio-temporal
motion models from data alone. Our approach, dubbed the Dropout Autoencoder
LSTM, is capable of synthesizing natural looking motion sequences over long
time horizons without catastrophic drift or motion degradation. The model
consists of two components, a 3-layer recurrent neural network to model
temporal aspects and a novel auto-encoder that is trained to implicitly recover
the spatial structure of the human skeleton via randomly removing information
about joints during training time. This Dropout Autoencoder (D-AE) is then used
to filter each predicted pose of the LSTM, reducing accumulation of error and
hence drift over time. Furthermore, we propose new evaluation protocols to
assess the quality of synthetic motion sequences even for which no ground truth
data exists. The proposed protocols can be used to assess generated sequences
of arbitrary length. Finally, we evaluate our proposed method on two of the
largest motion-capture datasets available to date and show that our model
outperforms the state-of-the-art on a variety of actions, including cyclic and
acyclic motion, and that it can produce natural looking sequences over longer
time horizons than previous methods
- …