153 research outputs found
Semi-supervised triplet loss based learning of ambient audio embeddings
International audienceDeep neural networks are particularly useful to learn relevant repre-sentations from data. Recent studies have demonstrated the poten-tial of unsupervised representation learning for ambient sound anal-ysis using various flavors of the triplet loss. They have comparedthis approach to supervised learning. However, in real situations,it is common to have a small labeled dataset and a large unlabeledone. In this paper, we combine unsupervised and supervised tripletloss based learning into a semi-supervised representation learningapproach. We propose two flavors of this approach, whereby thepositive samples for those triplets whose anchors are unlabeled areobtained either by applying a transformation to the anchor, or byselecting the nearest sample in the training set. We compare ourapproach to supervised and unsupervised representation learning aswell as the ratio between the amount of labeled and unlabeled data.We evaluate all the above approaches on an audio tagging task usingthe DCASE 2018 Task 4 dataset, and we show the impact of thisratio on the tagging performance
Learnable PINs: Cross-Modal Embeddings for Person Identity
We propose and investigate an identity sensitive joint embedding of face and
voice. Such an embedding enables cross-modal retrieval from voice to face and
from face to voice. We make the following four contributions: first, we show
that the embedding can be learnt from videos of talking faces, without
requiring any identity labels, using a form of cross-modal self-supervision;
second, we develop a curriculum learning schedule for hard negative mining
targeted to this task, that is essential for learning to proceed successfully;
third, we demonstrate and evaluate cross-modal retrieval for identities unseen
and unheard during training over a number of scenarios and establish a
benchmark for this novel task; finally, we show an application of using the
joint embedding for automatically retrieving and labelling characters in TV
dramas.Comment: To appear in ECCV 201
- …