21 research outputs found
Disentangled Latent Spaces Facilitate Data-Driven Auxiliary Learning
In deep learning, auxiliary objectives are often used to facilitate learning
in situations where data is scarce, or the principal task is extremely complex.
This idea is primarily inspired by the improved generalization capability
induced by solving multiple tasks simultaneously, which leads to a more robust
shared representation. Nevertheless, finding optimal auxiliary tasks that give
rise to the desired improvement is a crucial problem that often requires
hand-crafted solutions or expensive meta-learning approaches. In this paper, we
propose a novel framework, dubbed Detaux, whereby a weakly supervised
disentanglement procedure is used to discover new unrelated classification
tasks and the associated labels that can be exploited with the principal task
in any Multi-Task Learning (MTL) model. The disentanglement procedure works at
a representation level, isolating a subspace related to the principal task,
plus an arbitrary number of orthogonal subspaces. In the most disentangled
subspaces, through a clustering procedure, we generate the additional
classification tasks, and the associated labels become their representatives.
Subsequently, the original data, the labels associated with the principal task,
and the newly discovered ones can be fed into any MTL framework. Extensive
validation on both synthetic and real data, along with various ablation
studies, demonstrate promising results, revealing the potential in what has
been, so far, an unexplored connection between learning disentangled
representations and MTL. The code will be made publicly available upon
acceptance.Comment: Under review in Pattern Recognition Letter
A Commentary on the Unsupervised Learning of Disentangled Representations
The goal of the unsupervised learning of disentangled representations is to
separate the independent explanatory factors of variation in the data without
access to supervision. In this paper, we summarize the results of Locatello et
al., 2019, and focus on their implications for practitioners. We discuss the
theoretical result showing that the unsupervised learning of disentangled
representations is fundamentally impossible without inductive biases and the
practical challenges it entails. Finally, we comment on our experimental
findings, highlighting the limitations of state-of-the-art approaches and
directions for future research
SCADI: Self-supervised Causal Disentanglement in Latent Variable Models
Causal disentanglement has great potential for capturing complex situations.
However, there is a lack of practical and efficient approaches. It is already
known that most unsupervised disentangling methods are unable to produce
identifiable results without additional information, often leading to randomly
disentangled output. Therefore, most existing models for disentangling are
weakly supervised, providing information about intrinsic factors, which incurs
excessive costs. Therefore, we propose a novel model, SCADI(SElf-supervised
CAusal DIsentanglement), that enables the model to discover semantic factors
and learn their causal relationships without any supervision. This model
combines a masked structural causal model (SCM) with a pseudo-label generator
for causal disentanglement, aiming to provide a new direction for
self-supervised causal disentanglement models.Comment: 12 pages, 12 figure
Deep Variational Luenberger-type Observer for Stochastic Video Prediction
Considering the inherent stochasticity and uncertainty, predicting future
video frames is exceptionally challenging. In this work, we study the problem
of video prediction by combining interpretability of stochastic state space
models and representation learning of deep neural networks. Our model builds
upon an variational encoder which transforms the input video into a latent
feature space and a Luenberger-type observer which captures the dynamic
evolution of the latent features. This enables the decomposition of videos into
static features and dynamics in an unsupervised manner. By deriving the
stability theory of the nonlinear Luenberger-type observer, the hidden states
in the feature space become insensitive with respect to the initial values,
which improves the robustness of the overall model. Furthermore, the
variational lower bound on the data log-likelihood can be derived to obtain the
tractable posterior prediction distribution based on the variational principle.
Finally, the experiments such as the Bouncing Balls dataset and the Pendulum
dataset are provided to demonstrate the proposed model outperforms concurrent
works
Towards Racially Unbiased Skin Tone Estimation via Scene Disambiguation
Virtual facial avatars will play an increasingly important role in immersive
communication, games and the metaverse, and it is therefore critical that they
be inclusive. This requires accurate recovery of the appearance, represented by
albedo, regardless of age, sex, or ethnicity. While significant progress has
been made on estimating 3D facial geometry, albedo estimation has received less
attention. The task is fundamentally ambiguous because the observed color is a
function of albedo and lighting, both of which are unknown. We find that
current methods are biased towards light skin tones due to (1) strongly biased
priors that prefer lighter pigmentation and (2) algorithmic solutions that
disregard the light/albedo ambiguity. To address this, we propose a new
evaluation dataset (FAIR) and an algorithm (TRUST) to improve albedo estimation
and, hence, fairness. Specifically, we create the first facial albedo
evaluation benchmark where subjects are balanced in terms of skin color, and
measure accuracy using the Individual Typology Angle (ITA) metric. We then
address the light/albedo ambiguity by building on a key observation: the image
of the full scene -- as opposed to a cropped image of the face -- contains
important information about lighting that can be used for disambiguation. TRUST
regresses facial albedo by conditioning both on the face region and a global
illumination signal obtained from the scene image. Our experimental results
show significant improvement compared to state-of-the-art methods on albedo
estimation, both in terms of accuracy and fairness. The evaluation benchmark
and code will be made available for research purposes at
https://trust.is.tue.mpg.de.Comment: Camera-Ready version, accepted at ECCV202