Search CORE

2 research outputs found

Towards joint sound scene and polyphonic sound event recognition

Author: 20th Annual Conference of the International Speech Communication Association (INTERSPEECH 2019)
Bear H
Benetos E
Nolasco I
Publication venue: International Speech Communication Association (ISCA)
Publication date: 15/09/2019
Field of study

Acoustic Scene Classification (ASC) and Sound Event Detection (SED) are two separate tasks in the field of computational sound scene analysis. In this work, we present a new dataset with both sound scene and sound event labels and use this to demonstrate a novel method for jointly classifying sound scenes and recognizing sound events. We show that by taking a joint approach, learning is more efficient and whilst improvements are still needed for sound event detection, SED results are robust in a dataset where the sample distribution is skewed towards sound scenes

Queen Mary Research Online

Ensemble Models for Spoofing Detection in Automatic Speaker Verification

Author: 20th Annual Conference of the International Speech Communication Association (INTERSPEECH 2019)
Benetos E
Chettri B
Martinez Ramirez M
Morfi V
Stoller D
Sturm B
Publication venue: International Speech Communication Association (ISCA)
Publication date: 15/09/2019
Field of study

Detecting spoofing attempts of automatic speaker verification (ASV) systems is challenging, especially when using only one modelling approach. For robustness, we use both deep neural networks and traditional machine learning models and combine them as ensemble models through logistic regression. They are trained to detect logical access (LA) and physical access (PA) attacks on the dataset released as part of the ASV Spoofing and Countermeasures Challenge 2019. We propose dataset partitions that ensure different attack types are present during training and validation to improve system robustness. Our ensemble model outperforms all our single models and the baselines from the challenge for both attack types. We investigate why some models on the PA dataset strongly outperform others and find that spoofed recordings in the dataset tend to have longer silences at the end than genuine ones. By removing them, the PA task becomes much more challenging, with the tandem detection cost function (t-DCF) of our best single model rising from 0.1672 to 0.5018 and equal error rate (EER) increasing from 5.98% to 19.8% on the development set

Queen Mary Research Online