3,763 research outputs found
VGAN-Based Image Representation Learning for Privacy-Preserving Facial Expression Recognition
Reliable facial expression recognition plays a critical role in human-machine
interactions. However, most of the facial expression analysis methodologies
proposed to date pay little or no attention to the protection of a user's
privacy. In this paper, we propose a Privacy-Preserving Representation-Learning
Variational Generative Adversarial Network (PPRL-VGAN) to learn an image
representation that is explicitly disentangled from the identity information.
At the same time, this representation is discriminative from the standpoint of
facial expression recognition and generative as it allows expression-equivalent
face image synthesis. We evaluate the proposed model on two public datasets
under various threat scenarios. Quantitative and qualitative results
demonstrate that our approach strikes a balance between the preservation of
privacy and data utility. We further demonstrate that our model can be
effectively applied to other tasks such as expression morphing and image
completion
Disentangled Speech Embeddings using Cross-modal Self-supervision
The objective of this paper is to learn representations of speaker identity
without access to manually annotated data. To do so, we develop a
self-supervised learning objective that exploits the natural cross-modal
synchrony between faces and audio in video. The key idea behind our approach is
to tease apart--without annotation--the representations of linguistic content
and speaker identity. We construct a two-stream architecture which: (1) shares
low-level features common to both representations; and (2) provides a natural
mechanism for explicitly disentangling these factors, offering the potential
for greater generalisation to novel combinations of content and identity and
ultimately producing speaker identity representations that are more robust. We
train our method on a large-scale audio-visual dataset of talking heads `in the
wild', and demonstrate its efficacy by evaluating the learned speaker
representations for standard speaker recognition performance.Comment: ICASSP 2020. The first three authors contributed equally to this wor
- …