6 research outputs found
Break The Spell Of Total Correlation In betaTCVAE
In the absence of artificial labels, the independent and dependent features
in the data are cluttered. How to construct the inductive biases of the model
to flexibly divide and effectively contain features with different complexity
is the main focal point of unsupervised disentangled representation learning.
This paper proposes a new iterative decomposition path of total correlation and
explains the disentangled representation ability of VAE from the perspective of
model capacity allocation. The newly developed objective function combines
latent variable dimensions into joint distribution while relieving the
independence constraints of marginal distributions in combination, leading to
latent variables with a more manipulable prior distribution. The novel model
enables VAE to adjust the parameter capacity to divide dependent and
independent data features flexibly. Experimental results on various datasets
show an interesting relevance between model capacity and the latent variable
grouping size, called the "V"-shaped best ELBO trajectory. Additionally, we
empirically demonstrate that the proposed method obtains better disentangling
performance with reasonable parameter capacity allocation
Motion-DVAE: Unsupervised learning for fast human motion denoising
Pose and motion priors are crucial for recovering realistic and accurate
human motion from noisy observations. Substantial progress has been made on
pose and shape estimation from images, and recent works showed impressive
results using priors to refine frame-wise predictions. However, a lot of motion
priors only model transitions between consecutive poses and are used in
time-consuming optimization procedures, which is problematic for many
applications requiring real-time motion capture. We introduce Motion-DVAE, a
motion prior to capture the short-term dependencies of human motion. As part of
the dynamical variational autoencoder (DVAE) models family, Motion-DVAE
combines the generative capability of VAE models and the temporal modeling of
recurrent architectures. Together with Motion-DVAE, we introduce an
unsupervised learned denoising method unifying regression- and
optimization-based approaches in a single framework for real-time 3D human pose
estimation. Experiments show that the proposed approach reaches competitive
performance with state-of-the-art methods while being much faster
Multi-Rate VAE: Train Once, Get the Full Rate-Distortion Curve
Variational autoencoders (VAEs) are powerful tools for learning latent
representations of data used in a wide range of applications. In practice, VAEs
usually require multiple training rounds to choose the amount of information
the latent variable should retain. This trade-off between the reconstruction
error (distortion) and the KL divergence (rate) is typically parameterized by a
hyperparameter . In this paper, we introduce Multi-Rate VAE (MR-VAE), a
computationally efficient framework for learning optimal parameters
corresponding to various in a single training run. The key idea is to
explicitly formulate a response function that maps to the optimal
parameters using hypernetworks. MR-VAEs construct a compact response
hypernetwork where the pre-activations are conditionally gated based on
. We justify the proposed architecture by analyzing linear VAEs and
showing that it can represent response functions exactly for linear VAEs. With
the learned hypernetwork, MR-VAEs can construct the rate-distortion curve
without additional training and can be deployed with significantly less
hyperparameter tuning. Empirically, our approach is competitive and often
exceeds the performance of multiple -VAEs training with minimal
computation and memory overheads.Comment: 22 pages, 9 figure
Learning Directed Graphical Models with Optimal Transport
Estimating the parameters of a probabilistic directed graphical model from
incomplete data remains a long-standing challenge. This is because, in the
presence of latent variables, both the likelihood function and posterior
distribution are intractable without further assumptions about structural
dependencies or model classes. While existing learning methods are
fundamentally based on likelihood maximization, here we offer a new view of the
parameter learning problem through the lens of optimal transport. This
perspective licenses a general framework that operates on any directed graphs
without making unrealistic assumptions on the posterior over the latent
variables or resorting to black-box variational approximations. We develop a
theoretical framework and support it with extensive empirical evidence
demonstrating the flexibility and versatility of our approach. Across
experiments, we show that not only can our method recover the ground-truth
parameters but it also performs comparably or better on downstream
applications, notably the non-trivial task of discrete representation learning