Search CORE

86,869 research outputs found

Attentive Statistics Pooling for Deep Speaker Embedding

Author: Koshinaka Takafumi
Okabe Koji
Shinoda Koichi
Publication venue: 'International Speech Communication Association'
Publication date: 24/02/2019
Field of study

This paper proposes attentive statistics pooling for deep speaker embedding in text-independent speaker verification. In conventional speaker embedding, frame-level features are averaged over all the frames of a single utterance to form an utterance-level feature. Our method utilizes an attention mechanism to give different weights to different frames and generates not only weighted means but also weighted standard deviations. In this way, it can capture long-term variations in speaker characteristics more effectively. An evaluation on the NIST SRE 2012 and the VoxCeleb data sets shows that it reduces equal error rates (EERs) from the conventional method by 7.5% and 8.1%, respectively.Comment: Proc. Interspeech 2018, pp2252--2256. arXiv admin note: text overlap with arXiv:1809.0931

arXiv.org e-Print Archive

Crossref

Improvement of Text Dependent Speaker Identification System Using Neuro-Genetic Hybrid Algorithm in Office Environmental Conditions

Author: Islam Md. Rabiul
Rahman Md. Fayzur
Publication venue: International Journal of Computer Science Issues, IJCSI
Publication date: 01/08/2009
Field of study

In this paper, an improved strategy for automated text dependent speaker identification system has been proposed in noisy environment. The identification process incorporates the Neuro-Genetic hybrid algorithm with cepstral based features. To remove the background noise from the source utterance, wiener filter has been used. Different speech pre-processing techniques such as start-end point detection algorithm, pre-emphasis filtering, frame blocking and windowing have been used to process the speech utterances. RCC, MFCC, ?MFCC, ??MFCC, LPC and LPCC have been used to extract the features. After feature extraction of the speech, Neuro-Genetic hybrid algorithm has been used in the learning and identification purposes. Features are extracted by using different techniques to optimize the performance of the identification. According to the VALID speech database, the highest speaker identification rate of 100.000% for studio environment and 82.33% for office environmental conditions have been achieved in the close set text dependent speaker identification system

arXiv.org e-Print Archive

CogPrints Cognitive Sciences Eprint Archive

Information fusion for subband-HMM speaker recognition

Author: Damper R. I.
Dodd T. J.
Higgins J. E.
Publication venue
Publication date: 01/01/2001
Field of study

Southampton (e-Prints Soton)