22,922 research outputs found
From features to speaker vectors by means of restricted Boltzmann machine adaptation
Restricted Boltzmann Machines (RBMs) have shown success in different stages of speaker recognition systems. In this paper, we propose a novel framework to produce a vector-based representation for each speaker, which will be referred to as RBM-vector. This new approach maps the speaker spectral features to a single fixed-dimensional vector carrying speaker-specific information. In this work, a global model, referred to as Universal RBM (URBM), is trained taking advantage of RBM unsupervised learning capabilities. Then, this URBM is adapted
to the data of each speaker in the development, enrolment and
evaluation datasets. The network connection weights of the adapted RBMs are further concatenated and subject to a whitening with dimension reduction stage to build the speaker vectors. The evaluation is performed on the core test condition of the NIST SRE 2006 database, and it is shown that RBM-vectors achieve 15% relative improvement in terms of EER compared to i-vectors using cosine scoring. The score fusion with i-vector attains more than 24% relative improvement. The interest of this result for score fusion yields on the fact that both vectors are produced in an unsupervised fashion and can be used instead of i-vector/PLDA approach, when no data label is available. Results obtained for RBM-vector/PLDA framework is comparable with the ones from i-vector/PLDA. Their score fusion achieves 14% relative improvement compared to i-vector/PLDA.Peer ReviewedPostprint (published version
Speaker recognition by means of restricted Boltzmann machine adaptation
Restricted Boltzmann Machines (RBMs) have shown success in speaker recognition. In this paper, RBMs are investigated in a framework comprising a universal model training and model adaptation. Taking advantage of RBM unsupervised learning algorithm, a global model is trained based on all available background data. This general speaker-independent model, referred to as URBM, is further adapted to the data of a specific speaker to build speaker-dependent model. In order to show its effectiveness, we have applied this framework to two different tasks. It has been used to discriminatively model target and impostor spectral features for classification. It has been also utilized to produce a vector-based representation for speakers. This vector-based representation, similar to i-vector, can be further used for speaker recognition using either cosine scoring or Probabilistic Linear Discriminant Analysis (PLDA). The evaluation is performed on the core test condition of the NIST SRE 2006 database.Peer ReviewedPostprint (author's final draft
Robust language recognition via adaptive language factor extraction
This paper presents a technique to adapt an acoustically based
language classifier to the background conditions and speaker
accents. This adaptation improves language classification on
a broad spectrum of TV broadcasts. The core of the system
consists of an iVector-based setup in which language and channel
variabilities are modeled separately. The subsequent language
classifier (the backend) operates on the language factors,
i.e. those features in the extracted iVectors that explain the observed
language variability. The proposed technique adapts the
language variability model to the background conditions and
to the speaker accents present in the audio. The effect of the
adaptation is evaluated on a 28 hours corpus composed of documentaries and monolingual as well as multilingual broadcast
news shows. Consistent improvements in the automatic identification
of Flemish (Belgian Dutch), English and French are demonstrated for all broadcast types
Anti-spoofing Methods for Automatic SpeakerVerification System
Growing interest in automatic speaker verification (ASV)systems has lead to
significant quality improvement of spoofing attackson them. Many research works
confirm that despite the low equal er-ror rate (EER) ASV systems are still
vulnerable to spoofing attacks. Inthis work we overview different acoustic
feature spaces and classifiersto determine reliable and robust countermeasures
against spoofing at-tacks. We compared several spoofing detection systems,
presented so far,on the development and evaluation datasets of the Automatic
SpeakerVerification Spoofing and Countermeasures (ASVspoof) Challenge
2015.Experimental results presented in this paper demonstrate that the useof
magnitude and phase information combination provides a substantialinput into
the efficiency of the spoofing detection systems. Also wavelet-based features
show impressive results in terms of equal error rate. Inour overview we compare
spoofing performance for systems based on dif-ferent classifiers. Comparison
results demonstrate that the linear SVMclassifier outperforms the conventional
GMM approach. However, manyresearchers inspired by the great success of deep
neural networks (DNN)approaches in the automatic speech recognition, applied
DNN in thespoofing detection task and obtained quite low EER for known and
un-known type of spoofing attacks.Comment: 12 pages, 0 figures, published in Springer Communications in Computer
and Information Science (CCIS) vol. 66
- …