Search CORE

153 research outputs found

Semi-supervised triplet loss based learning of ambient audio embeddings

Author: Serizel Romain
Turpault Nicolas
Vincent Emmanuel
Publication venue: HAL CCSD
Publication date: 12/05/2019
Field of study

International audienceDeep neural networks are particularly useful to learn relevant repre-sentations from data. Recent studies have demonstrated the poten-tial of unsupervised representation learning for ambient sound anal-ysis using various flavors of the triplet loss. They have comparedthis approach to supervised learning. However, in real situations,it is common to have a small labeled dataset and a large unlabeledone. In this paper, we combine unsupervised and supervised tripletloss based learning into a semi-supervised representation learningapproach. We propose two flavors of this approach, whereby thepositive samples for those triplets whose anchors are unlabeled areobtained either by applying a transformation to the anchor, or byselecting the nearest sample in the training set. We compare ourapproach to supervised and unsupervised representation learning aswell as the ratio between the amount of labeled and unlabeled data.We evaluate all the above approaches on an audio tagging task usingthe DCASE 2018 Task 4 dataset, and we show the impact of thisratio on the tagging performance

Crossref

INRIA a CCSD electronic archive server

Learnable PINs: Cross-Modal Embeddings for Person Identity

Author: Albanie Samuel
Nagrani Arsha
Zisserman Andrew
Publication venue
Publication date: 01/01/2018
Field of study

We propose and investigate an identity sensitive joint embedding of face and voice. Such an embedding enables cross-modal retrieval from voice to face and from face to voice. We make the following four contributions: first, we show that the embedding can be learnt from videos of talking faces, without requiring any identity labels, using a form of cross-modal self-supervision; second, we develop a curriculum learning schedule for hard negative mining targeted to this task, that is essential for learning to proceed successfully; third, we demonstrate and evaluate cross-modal retrieval for identities unseen and unheard during training over a number of scenarios and establish a benchmark for this novel task; finally, we show an application of using the joint embedding for automatically retrieving and labelling characters in TV dramas.Comment: To appear in ECCV 201

arXiv.org e-Print Archive

Oxford University Research Archive

Learning Sensory Representations with Minimal Supervision

Author: Saeed Aaqib
Publication venue: Technische Universiteit Eindhoven
Publication date: 24/06/2021
Field of study

Pure OAI Repository