Deep complementary features for speaker identification in TV broadcast data

Besacier, Laurent; Budnik, Mateusz; Demiroglu, Cenk; Khodabakhsh, Ali

research

Deep complementary features for speaker identification in TV broadcast data

Authors: Laurent Besacier
Mateusz Budnik
Cenk Demiroglu
Ali Khodabakhsh
Publication date: 1 June 2016
Publisher: 'International Speech Communication Association'
Doi

Abstract

International audienceThis work tries to investigate the use of a Convolutional Neu-ral Network approach and its fusion with more traditional systems such as Total Variability Space for speaker identification in TV broadcast data. The former uses spectrograms for training, while the latter is based on MFCC features. The dataset poses several challenges such as significant class imbalance or background noise and music. Even though the performance of the Convolutional Neural Network is lower than the state-of-the-art, it is able to complement it and give better results through fusion. Different fusion techniques are evaluated using both early and late fusion

Similar works

Full text

Available Versions

HAL: Hyper Article en Ligne

oai:HAL:hal-01350068v1

Last time updated on 23/11/2024

Hal - Université Grenoble Alpes

oai:HAL:hal-01350068v1

Last time updated on 11/11/2016

Crossref

info:doi/10.21437%2Fodyssey.20...

Last time updated on 03/08/2021