Search CORE

928 research outputs found

Learning to separate vocals from polyphonic mixtures via ensemble methods and structured output prediction

Author: De Bie Tijl
Mcvicar Matt
Santos-Rodriguez Raul
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/05/2016
Field of study

Crossref

Explore Bristol Research

Acoustic Identification of Flat Spots On Wheels Using Different Machine Learning Techniques

Author: Dernbach Gabriel
Lykartsis Athanasios
Sievers Leon
Weinzierl Stefan
Publication venue
Publication date: 16/05/2020
Field of study

BMBF, 01IS18049B, ALICE III - Autonomes Lernen in komplexen Umgebungen 3 (Autonomous Learning in Complex Environments 3

DepositOnce

Singing voice separation: a study on training data

Author: Hennequin Romain
Prétet Laure
Royo-Letelier Jimena
Vaglio Andrea
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 06/06/2019
Field of study

In the recent years, singing voice separation systems showed increased performance due to the use of supervised training. The design of training datasets is known as a crucial factor in the performance of such systems. We investigate on how the characteristics of the training dataset impacts the separation performances of state-of-the-art singing voice separation algorithms. We show that the separation quality and diversity are two important and complementary assets of a good training dataset. We also provide insights on possible transforms to perform data augmentation for this task

arXiv.org e-Print Archive

Crossref

Learnable PINs: Cross-Modal Embeddings for Person Identity

Author: Albanie Samuel
Nagrani Arsha
Zisserman Andrew
Publication venue
Publication date: 01/01/2018
Field of study

We propose and investigate an identity sensitive joint embedding of face and voice. Such an embedding enables cross-modal retrieval from voice to face and from face to voice. We make the following four contributions: first, we show that the embedding can be learnt from videos of talking faces, without requiring any identity labels, using a form of cross-modal self-supervision; second, we develop a curriculum learning schedule for hard negative mining targeted to this task, that is essential for learning to proceed successfully; third, we demonstrate and evaluate cross-modal retrieval for identities unseen and unheard during training over a number of scenarios and establish a benchmark for this novel task; finally, we show an application of using the joint embedding for automatically retrieving and labelling characters in TV dramas.Comment: To appear in ECCV 201

arXiv.org e-Print Archive

Oxford University Research Archive

A Comparative Analysis Of Latent Regressor Losses For Singing Voice Conversion

Author: Dixon S
O'Connor B
Sound and Music Computing
Publication venue: SMC Network
Publication date: 12/06/2023
Field of study

Previous research has shown that established techniques for spoken voice conversion (VC) do not perform as well when applied to singing voice conversion (SVC). We propose an alternative loss component in a loss function that is otherwise well-established among VC tasks, which has been shown to improve our model’s SVC performance. We first trained a singer identity embedding (SIE) network on mel-spectrograms of singer recordings to produce singer-specific variance encodings using contrastive learning. We subsequently trained a well-known autoencoder framework (AutoVC) conditioned on these SIEs, and measured differences in SVC performance when using different latent regressor loss components. We found that using this loss w.r.t. SIEs leads to better performance than w.r.t. bottleneck embeddings, where converted audio is more natural and specific towards target singers. The inclusion of this loss component has the advantage of explicitly forcing the network to reconstruct with timbral similarity, and also negates the effect of poor disentanglement in AutoVC’s bottleneck embeddings. We demonstrate peculiar diversity between computational and human evaluations on singer converted audio clips, which highlights the necessity of both. We also propose a pitch-matching mechanism between source and target singers to ensure these evaluations are not influenced by differences in pitch register

Queen Mary Research Online