624 research outputs found
Hyperprior Induced Unsupervised Disentanglement of Latent Representations
We address the problem of unsupervised disentanglement of latent
representations learnt via deep generative models. In contrast to current
approaches that operate on the evidence lower bound (ELBO), we argue that
statistical independence in the latent space of VAEs can be enforced in a
principled hierarchical Bayesian manner. To this effect, we augment the
standard VAE with an inverse-Wishart (IW) prior on the covariance matrix of the
latent code. By tuning the IW parameters, we are able to encourage (or
discourage) independence in the learnt latent dimensions. Extensive
experimental results on a range of datasets (2DShapes, 3DChairs, 3DFaces and
CelebA) show our approach to outperform the -VAE and is competitive with
the state-of-the-art FactorVAE. Our approach achieves significantly better
disentanglement and reconstruction on a new dataset (CorrelatedEllipses) which
introduces correlations between the factors of variation.Comment: AAAI-201
Dual Gaussian-based Variational Subspace Disentanglement for Visible-Infrared Person Re-Identification
Visible-infrared person re-identification (VI-ReID) is a challenging and
essential task in night-time intelligent surveillance systems. Except for the
intra-modality variance that RGB-RGB person re-identification mainly overcomes,
VI-ReID suffers from additional inter-modality variance caused by the inherent
heterogeneous gap. To solve the problem, we present a carefully designed dual
Gaussian-based variational auto-encoder (DG-VAE), which disentangles an
identity-discriminable and an identity-ambiguous cross-modality feature
subspace, following a mixture-of-Gaussians (MoG) prior and a standard Gaussian
distribution prior, respectively. Disentangling cross-modality
identity-discriminable features leads to more robust retrieval for VI-ReID. To
achieve efficient optimization like conventional VAE, we theoretically derive
two variational inference terms for the MoG prior under the supervised setting,
which not only restricts the identity-discriminable subspace so that the model
explicitly handles the cross-modality intra-identity variance, but also enables
the MoG distribution to avoid posterior collapse. Furthermore, we propose a
triplet swap reconstruction (TSR) strategy to promote the above disentangling
process. Extensive experiments demonstrate that our method outperforms
state-of-the-art methods on two VI-ReID datasets.Comment: Accepted by ACM MM 2020 poster. 12 pages, 10 appendixe
- …