Search CORE

55 research outputs found

3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces

Author: Clarkson Matthew J
Foti Simone
Koo Bongjin
Stoyanov Danail
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 23/03/2022
Field of study

Learning a disentangled, interpretable, and structured latent representation in 3D generative models of faces and bodies is still an open problem. The problem is particularly acute when control over identity features is required. In this paper, we propose an intuitive yet effective self-supervised approach to train a 3D shape variational autoencoder (VAE) which encourages a disentangled latent representation of identity features. Curating the mini-batch generation by swapping arbitrary features across different shapes allows to define a loss function leveraging known differences and similarities in the latent representations. Experimental results conducted on 3D meshes show that state-of-the-art methods for latent disentanglement are not able to disentangle identity features of faces and bodies. Our proposed method properly decouples the generation of such features while maintaining good representation and reconstruction capabilities

arXiv.org e-Print Archive

An Analysis of the Inner Workings of Variational Autoencoders

Author: Zietlow Urs Dominik
Publication venue: Universität Tübingen
Publication date: 18/01/2023
Field of study

Representation learning, the task of extracting meaningful representations of high-dimensional data, lies at the very core of artificial intelligence research. Be it via implicit training of features in a variety of computer vision tasks, over more old-school, hand-crafted feature extraction mechanisms for, e.g., eye-tracking or other applications, all the way to explicit learning of semantically meaningful data representations. Strictly speaking, any activation of a layer within a neural network can be considered a representation of the input data. This makes the research about achieving explicit control over properties of such representations a fundamentally attractive task. An often desired property of learned representations is called disentanglement. The idea of a disentangled representation stems from the goal of separating sources of variance in the data and consolidates itself in the concept of recovering generative factors. Assuming that every data has its origin in a generative process that produces high-dimensional data given a low-dimensional representation (e.g., rendering images of people given visual attributes, such as hairstyle, camera angle, age, ...), the goal of finding a disentangled representation is to recover those attributes. The Variational Autoencoder (VAE) is a famous architecture commonly used for disentangled representation learning, and this work summarizes an analysis of its inner workings. VAEs achieved a lot of attention due to their, at the time, unparalleled performance as both generative models and inference models for learning disentangled representations. However, note that the disentanglement property of a representation is not invariant to rotations of the learned representation, i.e., rotating a learned representation can change and destroy its disentanglement quality. Given a rotationally symmetric prior over the representations space, the idealized objective function of VAEs is rotationally symmetric. Their success at producing disentangled representations consequently comes as a particular surprise. This thesis discusses why VAEs pursue a particular alignment for their representations and how the chosen alignment is correlated with the generative factors of existing representation learning datasets

Publikationsserver der Universität Tübingen