Speaker identification and clustering using convolutional neural networks

Dürr, Oliver; Lukic, Yanick; Stadelmann, Thilo; Vogt, Carlo

research

Speaker identification and clustering using convolutional neural networks

Authors: Oliver Dürr
Yanick Lukic
Thilo Stadelmann
Carlo Vogt
Publication date: 1 January 2016
Publisher: IEEE
Doi

Abstract

Deep learning, especially in the form of convolutional neural networks (CNNs), has triggered substantial improvements in computer vision and related fields in recent years. This progress is attributed to the shift from designing features and subsequent individual sub-systems towards learning features and recognition systems end to end from nearly unprocessed data. For speaker clustering, however, it is still common to use handcrafted processing chains such as MFCC features and GMM-based models. In this paper, we use simple spectrograms as input to a CNN and study the optimal design of those networks for speaker identification and clustering. Furthermore, we elaborate on the question how to transfer a network, trained for speaker identification, to speaker clustering. We demonstrate our approach on the well known TIMIT dataset, achieving results comparable with the state of the art – without the need for handcrafted features

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

ZHAW Zürcher Hochschule für Angewandte Wissenschaften

oai:digitalcollection.zhaw.ch:...

Last time updated on 14/07/2018

ZHAW digitalcollection

oai:digitalcollection.zhaw.ch:...

Last time updated on 11/07/2018